MCP Hub
Back to servers

graph-tool-call

Graph-based tool retrieval for LLM agents. Builds a tool graph from OpenAPI/MCP specs and retrieves multi-step workflows via hybrid search (BM25 + graph traversal + embedding), recovering accuracy from 12% to 82% with 79% fewer tokens. Also works as an MCP Proxy to aggregate multiple servers behind 3 meta-tools.

glama
Stars
4
Forks
1
Updated
Mar 16, 2026
Validated
Mar 17, 2026

graph-tool-call

LLM agents can't fit thousands of tool definitions into context.
Vector search finds similar tools, but misses the workflow they belong to.
graph-tool-call builds a tool graph and retrieves the right chain — not just one match.


Baseline (all tools)graph-tool-call
248 tools (K8s API)12% accuracy82% accuracy
Token usage8,192 tokens1,699 tokens (79% reduction)
50 tools (GitHub API)100% accuracy90% accuracy, 88% fewer tokens

Measured with qwen3:4b (4-bit) — full benchmark below


graph-tool-call demo

PyPI License: MIT Python 3.10+ CI Zero Dependencies

English · 한국어 · 中文 · 日本語


The Problem

LLM agents need tools. But as tool count grows, two things break:

  1. Context overflow — 248 Kubernetes API endpoints = 8,192 tokens of tool definitions. The LLM chokes and accuracy drops to 12%.
  2. Vector search misses workflows — Searching "cancel my order" finds cancelOrder, but the actual flow is listOrders → getOrder → cancelOrder → processRefund. Vector search returns one tool; you need the chain.

graph-tool-call solves both. It models tool relationships as a graph, retrieves multi-step workflows via hybrid search (BM25 + graph traversal + embedding + MCP annotations), and cuts token usage by 64–91% while maintaining or improving accuracy.

At a Glance

What you getHow
Workflow-aware retrievalGraph edges encode PRECEDES, REQUIRES, COMPLEMENTARY relations
Hybrid searchBM25 + graph traversal + embedding + MCP annotations, fused via wRRF
Zero dependenciesCore runs on Python stdlib only — add extras as needed
Any tool sourceAuto-ingest from OpenAPI / Swagger / MCP / Python functions
History-awarePreviously called tools are demoted; next-step tools are boosted
MCP Proxy172 tools across servers → 3 meta-tools, saving ~1,200 tokens/turn

Why Not Just Vector Search?

ScenarioVector-onlygraph-tool-call
"cancel my order"Returns cancelOrderlistOrders → getOrder → cancelOrder → processRefund
"read and save file"Returns read_fileread_file + write_file (COMPLEMENTARY relation)
"delete old records"Returns any tool matching "delete"Destructive tools ranked first via MCP annotations
"now cancel it" (after listing orders)No context from historyDemotes used tools, boosts next-step tools
Multiple Swagger specs with overlapping toolsDuplicate tools in resultsCross-source auto-deduplication
1,200 API endpointsSlow, noisy resultsCategorized + graph traversal for precise retrieval

How It Works

OpenAPI / MCP / Python functions → Ingest → Build tool graph → Hybrid retrieve → Agent

Example: User says "cancel my order and process a refund"

Vector search finds cancelOrder. But the actual workflow is:

                    ┌──────────┐
          PRECEDES  │listOrders│  PRECEDES
         ┌─────────┤          ├──────────┐
         ▼         └──────────┘          ▼
   ┌──────────┐                    ┌───────────┐
   │ getOrder │                    │cancelOrder│
   └──────────┘                    └─────┬─────┘
                                        │ COMPLEMENTARY
                                        ▼
                                 ┌──────────────┐
                                 │processRefund │
                                 └──────────────┘

graph-tool-call returns the entire chain, not just one tool. Retrieval combines four signals via weighted Reciprocal Rank Fusion (wRRF):

  • BM25 — keyword matching
  • Graph traversal — relation-based expansion (PRECEDES, REQUIRES, COMPLEMENTARY)
  • Embedding similarity — semantic search (optional, any provider)
  • MCP annotations — read-only / destructive / idempotent hints

Installation

The core package has zero dependencies — just Python standard library. Install only what you need:

pip install graph-tool-call                    # core (BM25 + graph) — no dependencies
pip install graph-tool-call[embedding]         # + embedding, cross-encoder reranker
pip install graph-tool-call[openapi]           # + YAML support for OpenAPI specs
pip install graph-tool-call[mcp]              # + MCP server mode
pip install graph-tool-call[all]               # everything
All extras
ExtraInstallsWhen to use
openapipyyamlYAML OpenAPI specs
embeddingnumpySemantic search (connect to Ollama/OpenAI/vLLM)
embedding-localnumpy, sentence-transformersLocal sentence-transformers models
similarityrapidfuzzDuplicate detection
langchainlangchain-coreLangChain integration
visualizationpyvis, networkxHTML graph export, GraphML
dashboarddash, dash-cytoscapeInteractive dashboard
lintai-api-lintAuto-fix bad API specs
mcpmcpMCP server mode
pip install graph-tool-call[lint]
pip install graph-tool-call[similarity]
pip install graph-tool-call[visualization]
pip install graph-tool-call[dashboard]
pip install graph-tool-call[langchain]

Quick Start

Try it in 30 seconds (no install needed)

uvx graph-tool-call search "user authentication" \
  --source https://petstore.swagger.io/v2/swagger.json
Query: "user authentication"
Source: https://petstore.swagger.io/v2/swagger.json (19 tools)
Results (5):

  1. getUserByName
     Get user by user name
  2. deleteUser
     Delete user
  3. createUser
     Create user
  4. loginUser
     Logs user into the system
  5. updateUser
     Updated user

Python API

from graph_tool_call import ToolGraph

# Build a tool graph from the official Petstore API
tg = ToolGraph.from_url(
    "https://petstore3.swagger.io/api/v3/openapi.json",
    cache="petstore.json",
)

print(tg)
# → ToolGraph(tools=19, nodes=22, edges=100)

# Search for tools
tools = tg.retrieve("create a new pet", top_k=5)
for t in tools:
    print(f"{t.name}: {t.description}")

# Search with workflow guidance
results = tg.retrieve_with_scores("process an order", top_k=5)
for r in results:
    print(f"{r.tool.name} [{r.confidence}]")
    for rel in r.relations:
        print(f"  → {rel.hint}")

# Execute an API directly (OpenAPI tools)
result = tg.execute(
    "addPet", {"name": "Buddy", "status": "available"},
    base_url="https://petstore3.swagger.io/api/v3",
)

MCP Server (Claude Code, Cursor, Windsurf, etc.)

Run as an MCP server — any MCP-compatible agent can use tool search with just a config entry:

// .mcp.json
{
  "mcpServers": {
    "tool-search": {
      "command": "uvx",
      "args": ["graph-tool-call[mcp]", "serve",
               "--source", "https://api.example.com/openapi.json"]
    }
  }
}

The server exposes 6 tools: search_tools, get_tool_schema, execute_tool, list_categories, graph_info, load_source.

Search results include workflow guidance — relations between tools and suggested execution order:

{
  "tools": [
    {"name": "createOrder", "relations": [
      {"target": "getOrder", "type": "precedes", "hint": "Call this tool before getOrder"}
    ]},
    {"name": "getOrder", "prerequisites": ["createOrder"]}
  ],
  "workflow": {"suggested_order": ["createOrder", "getOrder", "updateOrderStatus"]}
}

MCP Proxy (aggregate multiple MCP servers)

When you have many MCP servers, their tool names pile up in every LLM turn. MCP Proxy bundles them behind a single server — 172 tools → 3 meta-tools, saving ~1,200 tokens per turn.

Step 1. Create backends.json with your existing MCP servers:

// ~/backends.json
{
  "backends": {
    "playwright": {
      "command": "npx",
      "args": ["@playwright/mcp", "--headless"]
    },
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@anthropic/mcp-filesystem", "/home"]
    },
    "my-api": {
      "command": "uvx",
      "args": ["some-mcp-server"],
      "env": { "API_KEY": "sk-..." }
    }
  },
  "top_k": 10,
  "cache_path": "~/.cache/mcp-proxy-cache.json"
}

Embedding is optional. Add "embedding": "ollama/qwen3-embedding:0.6b" for cross-language search (requires Ollama running). Without it, BM25 keyword search still works.

Step 2. Register the proxy with Claude Code:

claude mcp add -s user tool-proxy -- \
  uvx "graph-tool-call[mcp]" proxy --config ~/backends.json

Step 3. Remove the original individual servers (so they don't duplicate):

claude mcp remove playwright -s user
claude mcp remove filesystem -s user
claude mcp remove my-api -s user

Step 4. Restart Claude Code and verify:

claude mcp list
# tool-proxy: ... - ✓ Connected
# (individual servers should be gone)

That's it. The proxy exposes search_tools, get_tool_schema, and call_backend_tool. After searching, matched tools are dynamically injected for 1-hop direct calling.

Alternative: .mcp.json config
// .mcp.json (project-level or global)
{
  "mcpServers": {
    "tool-proxy": {
      "command": "uvx",
      "args": ["graph-tool-call[mcp]", "proxy",
               "--config", "/path/to/backends.json"]
    }
  }
}

Direct Integration (OpenAI, Ollama, vLLM, Azure, etc.)

Use retrieve() to search, then convert to OpenAI function-calling format. Works with any OpenAI-compatible API:

from openai import OpenAI
from graph_tool_call import ToolGraph
from graph_tool_call.langchain.tools import tool_schema_to_openai_function

# Build graph from any source
tg = ToolGraph.from_url(
    "https://petstore3.swagger.io/api/v3/openapi.json",
    cache="petstore.json",
)

# Retrieve only the relevant tools for a query
tools = tg.retrieve("create a new pet", top_k=5)

# Convert to OpenAI function-calling format
openai_tools = [
    {"type": "function", "function": tool_schema_to_openai_function(t)}
    for t in tools
]

# Use with any provider — OpenAI, Azure, Ollama, vLLM, llama.cpp, etc.
client = OpenAI()  # or OpenAI(base_url="http://localhost:11434/v1") for Ollama
response = client.chat.completions.create(
    model="gpt-4o",
    tools=openai_tools,  # only 5 relevant tools instead of all 248
    messages=[{"role": "user", "content": "create a new pet"}],
)
Anthropic Claude API
from anthropic import Anthropic
from graph_tool_call import ToolGraph

tg = ToolGraph.from_url("https://api.example.com/openapi.json")
tools = tg.retrieve("cancel an order", top_k=5)

# Convert to Anthropic tool format
anthropic_tools = [
    {
        "name": t.name,
        "description": t.description,
        "input_schema": {
            "type": "object",
            "properties": {
                p.name: {"type": p.type, "description": p.description}
                for p in t.parameters
            },
            "required": [p.name for p in t.parameters if p.required],
        },
    }
    for t in tools
]

client = Anthropic()
response = client.messages.create(
    model="claude-sonnet-4-20250514",
    tools=anthropic_tools,
    messages=[{"role": "user", "content": "cancel my order"}],
    max_tokens=1024,
)

SDK Middleware (zero code changes)

Already have tool-calling code? Add one line to automatically filter tools:

from graph_tool_call import ToolGraph
from graph_tool_call.middleware import patch_openai

tg = ToolGraph.from_url("https://api.example.com/openapi.json")

patch_openai(client, graph=tg, top_k=5)  # ← add this line

# Existing code unchanged — 248 tools go in, only 5 relevant ones are sent
response = client.chat.completions.create(
    model="gpt-4o",
    tools=all_248_tools,
    messages=messages,
)
# Also works with Anthropic
from graph_tool_call.middleware import patch_anthropic
patch_anthropic(client, graph=tg, top_k=5)

LangChain Integration

pip install graph-tool-call[langchain]
from graph_tool_call import ToolGraph
from graph_tool_call.langchain import GraphToolRetriever

tg = ToolGraph.from_url("https://api.example.com/openapi.json")

# Use as a LangChain retriever — compatible with any chain/agent
retriever = GraphToolRetriever(tool_graph=tg, top_k=5)
docs = retriever.invoke("cancel an order")

for doc in docs:
    print(doc.page_content)       # "cancelOrder: Cancel an existing order"
    print(doc.metadata["tags"])   # ["order"]

Benchmark

graph-tool-call verifies two things.

  1. Can performance be maintained or improved by giving the LLM only a subset of retrieved tools?
  2. Does the retriever itself rank the correct tools within the top K?

The evaluation compared the following configurations on the same set of user requests.

  • baseline: pass all tool definitions to the LLM as-is
  • retrieve-k3 / k5 / k10: pass only the top K retrieved tools
  • + embedding / + ontology: add semantic search and LLM-based ontology enrichment on top of retrieve-k5

The model used was qwen3:4b (4-bit, Ollama).

Evaluation Metrics

  • Accuracy: Did the LLM ultimately select the correct tool?
  • Recall@K: Was the correct tool included in the top K results at the retrieval stage?
  • Avg tokens: Average tokens passed to the LLM
  • Token reduction: Token savings compared to baseline

Results at a glance

  • Small-scale APIs (19~50 tools): baseline is already strong. In this range, graph-tool-call's main value is 64~91% token savings while maintaining near-baseline accuracy.
  • Large-scale APIs (248 tools): baseline collapses to 12%. In contrast, graph-tool-call maintains 78~82% accuracy. At this scale, it's not an optimization — it's closer to a required retrieval layer.
Full pipeline comparison

How to read the metrics

  • End-to-end Accuracy: Did the LLM ultimately succeed in selecting the correct tool or performing the correct workflow?
  • Gold Tool Recall@K: Was the canonical gold tool designated as the correct answer included in the top K at the retrieval stage?
  • These two metrics measure different things, so they don't always match.
  • In particular, evaluations that accept alternative tools or equivalent workflows as correct answers may show End-to-end Accuracy that doesn't exactly match Gold Tool Recall@K.
  • baseline has no retrieval stage, so Gold Tool Recall@K does not apply.
DatasetToolsPipelineEnd-to-end AccuracyGold Tool Recall@KAvg tokensToken reduction
Petstore19baseline100.0%1,239
Petstore19retrieve-k390.0%93.3%30575.4%
Petstore19retrieve-k595.0%98.3%44064.4%
Petstore19retrieve-k10100.0%98.3%72041.9%
GitHub50baseline100.0%3,302
GitHub50retrieve-k385.0%87.5%28991.3%
GitHub50retrieve-k587.5%87.5%39887.9%
GitHub50retrieve-k1090.0%92.5%66279.9%
Mixed MCP38baseline96.7%2,741
Mixed MCP38retrieve-k386.7%93.3%32888.0%
Mixed MCP38retrieve-k590.0%96.7%46183.2%
Mixed MCP38retrieve-k1096.7%100.0%82669.9%
Kubernetes core/v1248baseline12.0%8,192
Kubernetes core/v1248retrieve-k578.0%91.0%1,61380.3%
Kubernetes core/v1248retrieve-k5 + embedding80.0%94.0%1,72878.9%
Kubernetes core/v1248retrieve-k5 + ontology82.0%96.0%1,69979.3%
Kubernetes core/v1248retrieve-k5 + embedding + ontology82.0%98.0%1,92476.5%

How to read this table

  • baseline is the result of passing all tool definitions to the LLM without any retrieval.
  • retrieve-k variants pass only a subset of retrieved tools to the LLM, so both retrieval quality and LLM selection ability affect performance.
  • Therefore, a baseline accuracy of 100% does not mean retrieve-k accuracy must also be 100%.
  • Gold Tool Recall@K measures whether retrieval placed the canonical gold tool in the top-k, while End-to-end Accuracy measures whether the final task execution succeeded.
  • Because of this, evaluations that accept alternative tools or equivalent workflows may show the two values not exactly matching.

Key insights

  • Petstore / GitHub / Mixed MCP: When tool count is small or medium, baseline is already strong. In this range, graph-tool-call's main value is significantly reducing tokens without much accuracy loss.
  • Kubernetes core/v1 (248 tools): When tool count is large, baseline collapses due to context overload. graph-tool-call recovers performance from 12.0% to 78.0~82.0% by narrowing candidates through retrieval.
  • In practice, retrieve-k5 is the best default. It offers a good balance of token efficiency and performance. On large datasets, adding embedding / ontology yields further improvement.

Retrieval performance: Does the retriever find the correct tools in the top K?

The table below measures the quality of retrieval itself, before the LLM stage. Only BM25 + graph traversal were used here — no embedding or ontology.

How to read the metrics

  • Gold Tool Recall@K: Was the canonical gold tool designated as the correct answer included in the top K at the retrieval stage?
  • This table shows how well the retriever constructs the candidate set, not the final LLM selection accuracy.
  • Therefore, this table should be read together with the End-to-end Accuracy table above.
  • Even if retrieval places the gold tool in the top-k, the final LLM doesn't always select the correct answer.
  • Conversely, in end-to-end evaluations that accept alternative tools or equivalent workflows as correct, the final accuracy and gold recall may not exactly match.
DatasetToolsGold Tool Recall@3Gold Tool Recall@5Gold Tool Recall@10
Petstore1993.3%98.3%98.3%
GitHub5087.5%87.5%92.5%
Mixed MCP3893.3%96.7%100.0%
Kubernetes core/v124882.0%91.0%92.0%

How to read this table

  • Gold Tool Recall@K shows the retriever's ability to include the correct tool in the candidate set.
  • On small datasets, k=5 alone achieves high recall.
  • On large datasets, increasing k raises recall, but also increases the tokens passed to the LLM.
  • In practice, you should consider not just recall but also token cost and final end-to-end accuracy together.

Key insights

  • Petstore / Mixed MCP: k=5 alone includes nearly all correct tools in the candidate set.
  • GitHub: There is a recall gap between k=5 and k=10, so k=10 may be better if higher recall is needed.
  • Kubernetes core/v1: Even with a large number of tools, k=5 already achieves 91.0% gold recall. The retrieval stage alone can significantly compress the candidate set while retaining most correct tools.
  • Overall, retrieve-k5 is the most practical default. k=3 is lighter but may miss some correct tools, while k=10 may increase token costs relative to recall gains.

When do embedding and ontology help?

On the largest dataset, Kubernetes core/v1 (248 tools), we compared adding extra signals on top of retrieve-k5.

PipelineEnd-to-end AccuracyGold Tool Recall@5Interpretation
retrieve-k578.0%91.0%BM25 + graph alone is a strong baseline
+ embedding80.0%94.0%Recovers queries that are semantically similar but differently worded
+ ontology82.0%96.0%LLM-generated keywords/example queries significantly improve retrieval quality
+ embedding + ontology82.0%98.0%Accuracy maintained, gold recall at its highest

Summary

  • Embedding compensates for semantic similarity that BM25 misses.
  • Ontology expands the searchable representation itself when tool descriptions are short or non-standard.
  • Using both together may show limited additional gains in end-to-end accuracy, but the ability to include correct tools in the candidate set becomes strongest.

Reproduce it

# Retrieval quality (fast, no LLM needed)
python -m benchmarks.run_benchmark
python -m benchmarks.run_benchmark -d k8s -v

# Pipeline benchmark (LLM comparison)
python -m benchmarks.run_benchmark --mode pipeline -m qwen3:4b
python -m benchmarks.run_benchmark --mode pipeline --pipelines baseline retrieve-k3 retrieve-k5 retrieve-k10

# Save baseline and compare
python -m benchmarks.run_benchmark --mode pipeline --save-baseline
python -m benchmarks.run_benchmark --mode pipeline --diff

Basic Usage

From OpenAPI / Swagger

from graph_tool_call import ToolGraph

# From file (JSON / YAML)
tg = ToolGraph()
tg.ingest_openapi("path/to/openapi.json")

# From URL — auto-discovers all spec groups from Swagger UI
tg = ToolGraph.from_url("https://api.example.com/swagger-ui/index.html")

# With caching — build once, reload instantly
tg = ToolGraph.from_url(
    "https://api.example.com/swagger-ui/index.html",
    cache="my_api.json",
)

# Supports: Swagger 2.0, OpenAPI 3.0, OpenAPI 3.1

From MCP Server Tools

from graph_tool_call import ToolGraph

mcp_tools = [
    {
        "name": "read_file",
        "description": "Read a file",
        "inputSchema": {"type": "object", "properties": {"path": {"type": "string"}}},
        "annotations": {"readOnlyHint": True, "destructiveHint": False},
    },
    {
        "name": "delete_file",
        "description": "Delete a file permanently",
        "inputSchema": {"type": "object", "properties": {"path": {"type": "string"}}},
        "annotations": {"readOnlyHint": False, "destructiveHint": True},
    },
]

tg = ToolGraph()
tg.ingest_mcp_tools(mcp_tools, server_name="filesystem")

tools = tg.retrieve("delete temporary files", top_k=5)

MCP annotations (readOnlyHint, destructiveHint, idempotentHint, openWorldHint) are used as retrieval signals. Query intent is automatically classified — read queries prioritize read-only tools, delete queries prioritize destructive tools.

Directly From an MCP Server

from graph_tool_call import ToolGraph

tg = ToolGraph()

# Public MCP endpoint
tg.ingest_mcp_server("https://mcp.example.com/mcp")

# Local/private MCP endpoint (explicit opt-in)
tg.ingest_mcp_server(
    "http://127.0.0.1:3000/mcp",
    allow_private_hosts=True,
)

ingest_mcp_server() calls HTTP JSON-RPC tools/list, fetches the tool list, then ingests it with MCP annotations preserved.

Remote ingest safety defaults:

  • private / localhost hosts are blocked by default
  • remote response size is capped
  • redirects are limited
  • unexpected content types are rejected

From Python Functions

from graph_tool_call import ToolGraph

def read_file(path: str) -> str:
    """Read contents of a file."""

def write_file(path: str, content: str) -> None:
    """Write contents to a file."""

tg = ToolGraph()
tg.ingest_functions([read_file, write_file])

Parameters are extracted from type hints, descriptions from docstrings.

Manual Tool Registration

from graph_tool_call import ToolGraph

tg = ToolGraph()

tg.add_tools([
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {"city": {"type": "string"}},
            },
        },
    },
])

tg.add_relation("get_weather", "get_forecast", "complementary")

Embedding-based Hybrid Search

Add embedding-based semantic search on top of BM25 + graph. No heavy dependencies needed — use any external embedding server (Ollama, OpenAI, vLLM, etc.) or local sentence-transformers.

pip install graph-tool-call[embedding]           # numpy only (~20MB)
pip install graph-tool-call[embedding-local]      # + sentence-transformers (~2GB, local models)
# Ollama (recommended — lightweight, cross-language)
tg.enable_embedding("ollama/qwen3-embedding:0.6b")

# OpenAI
tg.enable_embedding("openai/text-embedding-3-large")

# vLLM / llama.cpp / any OpenAI-compatible server
tg.enable_embedding("vllm/Qwen/Qwen3-Embedding-0.6B")
tg.enable_embedding("vllm/model@http://gpu-box:8000/v1")
tg.enable_embedding("llamacpp/model@http://192.168.1.10:8080/v1")
tg.enable_embedding("http://localhost:8000/v1@my-model")

# Sentence-transformers (requires embedding-local extra)
tg.enable_embedding("sentence-transformers/all-MiniLM-L6-v2")

# Custom callable
tg.enable_embedding(lambda texts: my_embed_fn(texts))

Weights are automatically rebalanced when embedding is enabled. You can fine-tune them:

tg.set_weights(keyword=0.1, graph=0.4, embedding=0.5)

Save and Load

Build once, reuse everywhere. The full graph structure (nodes, edges, relation types, weights) is preserved.

# Save
tg.save("my_graph.json")

# Load
tg = ToolGraph.load("my_graph.json")

# Or use cache= in from_url() for automatic save/load
tg = ToolGraph.from_url(url, cache="my_graph.json")

When embedding search is enabled, saved graphs also preserve:

  • embedding vectors
  • restorable embedding provider config when available
  • retrieval weights
  • diversity settings

This lets ToolGraph.load() restore hybrid retrieval state without rebuilding embeddings from scratch.

Analysis and Dashboard

report = tg.analyze()
print(report.orphan_tools)

app = tg.dashboard_app()
# or: tg.dashboard(port=8050)

analyze() builds an operational summary with duplicates, conflicts, orphan tools, category coverage, and relation counts. dashboard() launches the interactive Dash Cytoscape UI for graph inspection and retrieval testing.


Advanced Features

Cross-Encoder Reranking

Second-stage reranking using a cross-encoder model.

tg.enable_reranker()  # default: cross-encoder/ms-marco-MiniLM-L-6-v2
tools = tg.retrieve("cancel order", top_k=5)

After narrowing candidates with wRRF, (query, tool_description) pairs are jointly encoded for more precise ranking.

MMR Diversity

Reduces redundant results to secure more diverse candidates.

tg.enable_diversity(lambda_=0.7)

History-Aware Retrieval

Pass previously called tool names to improve next-step retrieval.

# First call
tools = tg.retrieve("find my order")
# → [listOrders, getOrder, ...]

# Second call
tools = tg.retrieve("now cancel it", history=["listOrders", "getOrder"])
# → [cancelOrder, processRefund, ...]

Already-used tools are demoted, and tools closer to the next step in the graph are boosted.

wRRF Weight Tuning

Adjust the contribution of each signal.

tg.set_weights(
    keyword=0.2,     # BM25 text matching
    graph=0.5,       # graph traversal
    embedding=0.3,   # semantic similarity
    annotation=0.2,  # MCP annotation matching
)

LLM-Enhanced Ontology

Build richer tool ontologies using any LLM. Useful for category generation, relation inference, and search keyword expansion.

tg.auto_organize(llm="ollama/qwen2.5:7b")
tg.auto_organize(llm=lambda p: my_llm(p))
tg.auto_organize(llm=openai.OpenAI())
tg.auto_organize(llm="litellm/claude-sonnet-4-20250514")
Supported LLM inputs
InputWrapped as
OntologyLLM instancePass-through
callable(str) -> strCallableOntologyLLM
OpenAI client (has chat.completions)OpenAIClientOntologyLLM
"ollama/model"OllamaOntologyLLM
"openai/model"OpenAICompatibleOntologyLLM
"litellm/model"litellm.completion wrapper

Duplicate Detection

Find and merge duplicate tools across multiple API specs.

duplicates = tg.find_duplicates(threshold=0.85)
merged = tg.merge_duplicates(duplicates)
# merged = {"getUser_1": "getUser", ...}

Export and Visualization

# Interactive HTML (vis.js)
tg.export_html("graph.html", progressive=True)

# GraphML (Gephi, yEd)
tg.export_graphml("graph.graphml")

# Neo4j Cypher
tg.export_cypher("graph.cypher")

API Spec Lint Integration

Auto-fix poor OpenAPI specs before ingestion using ai-api-lint.

pip install graph-tool-call[lint]
tg = ToolGraph.from_url(url, lint=True)

CLI Reference

# One-liner search (ingest + retrieve in one step)
graph-tool-call search "cancel order" --source https://api.example.com/openapi.json
graph-tool-call search "delete user" --source ./openapi.json --scores --json

# MCP server
graph-tool-call serve --source https://api.example.com/openapi.json
graph-tool-call serve --graph prebuilt.json
graph-tool-call serve -s https://api1.com/spec.json -s https://api2.com/spec.json

# Build and save graph
graph-tool-call ingest https://api.example.com/openapi.json -o graph.json
graph-tool-call ingest ./spec.yaml --embedding --organize

# Search from pre-built graph
graph-tool-call retrieve "query" -g graph.json -k 10

# Analyze, visualize, dashboard
graph-tool-call analyze graph.json --duplicates --conflicts
graph-tool-call visualize graph.json -f html
graph-tool-call info graph.json
graph-tool-call dashboard graph.json --port 8050

Full API Reference

ToolGraph methods
MethodDescription
add_tool(tool)Add a single tool (auto-detects format)
add_tools(tools)Add multiple tools
ingest_openapi(source)Ingest from OpenAPI / Swagger spec
ingest_mcp_tools(tools)Ingest from MCP tool list
ingest_mcp_server(url)Fetch and ingest from MCP HTTP server
ingest_functions(fns)Ingest from Python callables
ingest_arazzo(source)Ingest Arazzo 1.0.0 workflow spec
from_url(url, cache=...)Build from Swagger UI or spec URL
add_relation(src, tgt, type)Add a manual relation
auto_organize(llm=...)Auto-categorize tools
build_ontology(llm=...)Build complete ontology
retrieve(query, top_k=10)Search for tools
validate_tool_call(call)Validate and auto-correct a tool call
assess_tool_call(call)Return allow / confirm / deny decision
enable_embedding(provider)Enable hybrid embedding search
enable_reranker(model)Enable cross-encoder reranking
enable_diversity(lambda_)Enable MMR diversity
set_weights(...)Tune wRRF fusion weights
find_duplicates(threshold)Find duplicate tools
merge_duplicates(pairs)Merge detected duplicates
apply_conflicts()Detect and add CONFLICTS_WITH edges
analyze()Build operational analysis summary
save(path) / load(path)Serialize / deserialize
export_html(path)Export interactive HTML visualization
export_graphml(path)Export to GraphML format
export_cypher(path)Export as Neo4j Cypher statements
dashboard_app() / dashboard()Build or launch interactive dashboard
suggest_next(tool, history=...)Suggest next tools based on graph

Feature Comparison

FeatureVector-only solutionsgraph-tool-call
DependenciesEmbedding model requiredZero (core runs on stdlib)
Tool sourceManual registrationAuto-ingest from Swagger / OpenAPI / MCP
Search methodFlat vector similarityMulti-stage hybrid (wRRF + rerank + MMR)
Behavioral semanticsNoneMCP annotation-aware retrieval
Tool relationsNone6 relation types, auto-detected
Call orderingNoneState machine + CRUD + response→request data flow
DeduplicationNoneCross-source duplicate detection
OntologyNoneAuto / LLM-Auto modes
History awarenessNoneDemotes used tools, boosts next-step
Spec qualityAssumes good specsai-api-lint auto-fix integration
LLM dependencyRequiredOptional (better with, works without)

Documentation

DocDescription
ArchitectureSystem overview, pipeline layers, data model
WBSWork Breakdown Structure — Phase 0~4 progress
DesignAlgorithm design — spec normalization, dependency detection, search modes, call ordering, ontology modes
ResearchCompetitive analysis, API scale data, commerce patterns
Release ChecklistRelease process, changelog flow, pre-release checks
OpenAPI GuideHow to write API specs that produce better tool graphs

Contributing

Contributions are welcome.

# Development setup
git clone https://github.com/SonAIengine/graph-tool-call.git
cd graph-tool-call
pip install poetry
poetry install --with dev --all-extras   # install all optional deps for full test coverage

# Run tests
poetry run pytest -v

# Lint
poetry run ruff check .
poetry run ruff format --check .

# Run benchmarks
python -m benchmarks.run_benchmark -v

License

MIT

Reviews

No reviews yet

Sign in to write a review