🛡️ Agent Shield

Full-stack security for AI Agents — Static Analysis + Runtime Interception

AI Agent 全栈安全防护 — 静态分析 + 运行时拦截

Catch data exfiltration, backdoors, prompt injection, tool poisoning, and supply chain attacks before they reach your AI agents — and intercept them at runtime.

Offline-first. AST-powered. Open source. Your data never leaves your machine.

npx @elliotllliu/agent-shield scan ./my-skill/

🏆 Three Things No Other Tool Does

1. 🔒 Runtime MCP Interception (Only Agent Shield)

Other tools only scan source code before install. Agent Shield also sits between your MCP client and server, intercepting every JSON-RPC message in real-time:

# Insert Agent Shield between client and server
agent-shield proxy node my-mcp-server.js

# Enforce mode: automatically block high-risk tool calls
agent-shield proxy --enforce python mcp_server.py

# Rate-limit + log all alerts
agent-shield proxy --rate-limit 30 --log alerts.jsonl node server.js

What it catches at runtime:

🎭 Tool description injection — hidden instructions in tool descriptions
💉 Result injection — malicious content in tool return values
🔑 Credential leakage — sensitive data in tool call parameters
📡 Beacon behavior — abnormal periodic callbacks (C2 pattern)
🪤 Rug-pull attacks — tools changing behavior after initial trust

Snyk doesn't have this. AgentSeal doesn't have this. This is the only open-source tool with static + runtime protection.

2. ⛓️ Cross-File Attack Chain Detection (Only Agent Shield)

Most scanners check one file at a time. Agent Shield traces data flow across your entire codebase to detect multi-file attack patterns:

🔴 Cross-file data flow:
   config_reader.py reads ~/.ssh/id_rsa → exfiltrator.py POSTs to external server
   (connected via imports)

5-stage kill chain model detects complete attack sequences:

🔴 Kill Chain detected:
   apt.py:4  → system info collection    [Reconnaissance]
   reader.py:8  → reads ~/.ssh/id_rsa    [Collection]
   sender.py:12 → POST to external server [Exfiltration]

   Reconnaissance → Access → Collection → Exfiltration → Persistence

Not just individual alerts — complete attack narratives.

3. 🧠 AST Taint Tracking (Not Regex)

Uses Python's ast module for precise analysis — dramatically reducing false positives:

user = input("cmd: ")
eval(user)          # → 🔴 HIGH: tainted input flows to eval
eval("{'a': 1}")    # → ✅ NOT flagged (safe string literal)
exec(config_var)    # → 🟡 MEDIUM: dynamic, not proven tainted

	Regex-based	AST-based (Agent Shield)
`eval("safe string")`	❌ False positive	✅ Not flagged
`# eval(x)` in comment	❌ False positive	✅ Not flagged
`eval(user_input)` tainted	⚠️ Can't distinguish	✅ HIGH (tainted)
f-string SQL injection	⚠️ Coarse	✅ Precise

⚡ Quick Start

# Scan a skill / MCP server / plugin (31 rules, offline, <1s)
npx @elliotllliu/agent-shield scan ./my-skill/

# Scan Dify plugins (.difypkg auto-extraction)
npx @elliotllliu/agent-shield scan ./plugin.difypkg

# Runtime interception (MCP proxy)
npx @elliotllliu/agent-shield proxy node my-mcp-server.js

# AI-powered deep analysis (uses YOUR API key)
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider openai --model gpt-4o
npx @elliotllliu/agent-shield scan ./skill/ --ai --provider ollama --model llama3

# Discover installed agents on your machine
npx @elliotllliu/agent-shield discover

# Check if installed agents are safe
npx @elliotllliu/agent-shield install-check

# SARIF output for GitHub Code Scanning
npx @elliotllliu/agent-shield scan ./skill/ --sarif -o results.sarif

# HTML report
npx @elliotllliu/agent-shield scan ./skill/ --html

# CI/CD gate
npx @elliotllliu/agent-shield scan ./skill/ --fail-under 70

📊 Agent Shield vs Competitors

	Agent Shield	Snyk Agent Scan	Tencent AI-Infra-Guard
Runtime MCP Interception	✅ MCP Proxy	❌	❌
Cross-file Attack Chain	✅	❌	Partial
AST Taint Tracking	✅ Python	❌	Unknown
Static Rules	31	6	Many (incl. infra)
Multi-language Injection	✅ 8 languages	❌ English only	Unknown
Description-Code Integrity	✅	❌	Unknown
Python Security	✅ 35 patterns + AST	❌	✅
Prompt Injection	✅ 55+ patterns + AI	✅ LLM (cloud)	Unknown
100% Offline	✅	❌ cloud required	✅
Zero Install (`npx`)	✅	❌ Python + uv	❌ Docker
Choose Your Own LLM	✅ OpenAI/Anthropic/Ollama	❌	❌
VS Code Extension	✅	❌	❌
GitHub App + Action	✅	❌	❌
Open Source	✅ MIT	❌	✅

🔍 31 Security Rules

🔴 High Risk

Rule	Detects
`data-exfil`	Reads sensitive data + sends HTTP requests (exfiltration pattern)
`backdoor`	`eval()`, `exec()`, `new Function()`, `child_process.exec()` with dynamic input
`reverse-shell`	Outbound socket connections piped to shell
`crypto-mining`	Mining pool connections, xmrig, coinhive
`credential-hardcode`	Hardcoded AWS keys (`AKIA...`), GitHub PATs, Stripe/Slack tokens
`obfuscation`	`eval(atob(...))`, hex chains, `String.fromCharCode` obfuscation

🟡 Medium Risk

Rule	Detects
`prompt-injection`	55+ patterns: instruction override, identity manipulation, TPA, encoding evasion
`tool-shadowing`	Cross-server tool name conflicts, tool override attacks
`env-leak`	Environment variables + outbound HTTP (credential theft)
`network-ssrf`	User-controlled URLs, AWS metadata endpoint access
`phone-home`	Periodic timer + HTTP request (beacon/C2 pattern)
`toxic-flow`	Cross-tool data leak and destructive flows
`skill-risks`	Financial ops, untrusted content, external dependencies
`python-security`	35 patterns: eval, pickle, subprocess, SQL injection, SSTI, path traversal
`go-rust-security`	22 patterns: command injection, unsafe blocks, raw SQL

🟢 Low Risk

Rule	Detects
`privilege`	SKILL.md declared permissions vs actual code behavior mismatch
`supply-chain`	Known CVEs in npm dependencies
`sensitive-read`	Access to `~/.ssh`, `~/.aws`, `~/.kube`
`excessive-perms`	Too many or dangerous permissions in SKILL.md
`mcp-manifest`	MCP server: wildcard perms, undeclared capabilities
`typosquatting`	Suspicious npm names: `1odash` → `lodash`
`hidden-files`	`.env` files with secrets committed to repo

🆕 Advanced Detection (Unique to Agent Shield)

Rule	Detects	Snyk?
`cross-file`	Cross-file data flow: File A reads secrets → File B sends HTTP	❌
`attack-chain`	Kill chain detection: Recon → Access → Collection → Exfil → Persistence	❌
`multilang-injection`	8-language injection: 中/日/韓/俄/阿/西/法/德 prompt injection	❌
`python-ast`	AST taint tracking: follows data from `input()` → `eval()`	❌
`description-integrity`	Description vs code: "read-only" tool that writes files	❌
`mcp-runtime`	MCP runtime: debug inspector, non-HTTPS, tool count explosion	❌

🔬 Multi-Language Prompt Injection

8 languages — most scanners only check English:

忽略上面的指令          → 🔴 Chinese injection
前の指示を無視          → 🔴 Japanese injection
이전의 지시를 무시       → 🔴 Korean injection
Игнорируй инструкции   → 🔴 Russian injection
تجاهل التعليمات        → 🔴 Arabic injection

📋 Real-World Validation: 493 Dify Plugins

We scanned the entire langgenius/dify-plugins repository:

Metric	Value
Plugins scanned	493
Files analyzed	9,862
Lines of code	939,367
Scan time	~120s
Average score	93/100

Risk Level	Count	%
🔴 High risk (real issues)	6	1.2%
🟡 Medium risk	73	14.8%
🟢 Clean	414	84.0%

6 confirmed high-risk plugins with real eval()/exec() executing dynamic code.

Full report →

💡 Example Output

🛡️  Agent Shield Scan Report
📁 Scanned: ./deceptive-tool (3 files, 25 lines)

Score: 0/100 (Critical Risk)

🔴 High Risk: 4 findings
🟡 Medium Risk: 6 findings
🟢 Low Risk: 1 finding

🔴 High Risk (4)
  ├─ calculator.py:7 — [backdoor] eval() with dynamic input
  │  result = eval(expr)
  ├─ manifest.yaml — [description-integrity] Scope creep: "calculator"
  │  tool sends emails — undisclosed and suspicious capability
  ├─ tools/calc.yaml — [description-integrity] Description claims
  │  "local only" but code makes network requests in: tools/calc.py
  └─ exfiltrator.py — [cross-file] Cross-file data flow:
     config_reader.py reads secrets → exfiltrator.py sends HTTP

⏱  136ms

🔌 Integrate Agent Shield Into Your Platform

Running a skill marketplace, MCP directory, or plugin registry? This section is for you.

Your platform lists hundreds of skills, MCP servers, and plugins. Users install them into AI agents with access to files, credentials, and shell commands. But:

❌ Nobody verifies what gets listed. A skill with eval(atob(...)) looks the same as a clean one.
❌ Users can't tell safe from dangerous. There's no security signal anywhere.
❌ One bad skill = total compromise. Credential theft, data exfiltration, reverse shells.

What You Get

	Without Agent Shield	With Agent Shield
User trust	"Is this safe?" — no idea	🟢🟡🟠🔴 Security score on every listing
Platform reputation	Same as every directory	"The only marketplace that verifies security"
Bad actors	Malicious skills sit undetected	Auto-flagged before users see them

How to Integrate (5 minutes)

npx @elliotllliu/agent-shield scan ./skill --format json

{
  "score": 92,
  "totalFindings": 1,
  "summary": { "high": 0, "medium": 0, "low": 1 },
  "findings": [
    {
      "severity": "low",
      "rule": "env-leak",
      "file": "src/config.ts",
      "line": 8,
      "message": "Environment variable access without validation"
    }
  ]
}

Store the JSON, render the badge. That's it.

📖 Full Integration Guide →

Who Should Integrate

Platform Type	Examples	Value
Skill directories	ClawHub, skills.sh	Security badges on every skill
MCP registries	mcp.so, Smithery, Glama	Scan servers before listing
Plugin marketplaces	Dify store, GPT store	Gate submissions by security score
Agent platforms	OpenClaw, Cline, Cursor	Warn users before install

📦 Ecosystem

🤖 GitHub App

Auto-scan every PR for security issues. Learn more →

💻 VS Code Extension

Real-time security diagnostics in your editor. Learn more →

🔒 Runtime MCP Proxy

Monitor MCP server behavior in real-time. Detect injection, exfiltration, and rug-pull attacks.

agent-shield proxy --enforce node my-mcp-server.js

⚙️ CI Integration

GitHub Action

name: Security Scan
on: [push, pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: elliotllliu/agent-shield@main
        with:
          path: './skills/'
          fail-under: '70'

GitHub Action with SARIF Upload

name: Security Scan (SARIF)
on: [push, pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    permissions:
      security-events: write
    steps:
      - uses: actions/checkout@v4
      - uses: elliotllliu/agent-shield@main
        with:
          path: './skills/'
          fail-under: '70'
          sarif: 'true'
      - name: Upload SARIF
        if: always()
        uses: github/codeql-action/upload-sarif@v3
        with:
          sarif_file: agent-shield-results.sarif

npx one-liner

- name: Security scan
  run: npx -y @elliotllliu/agent-shield scan . --fail-under 70

⚙️ Configuration

Create .agent-shield.yml (or run agent-shield init):

rules:
  disable:
    - supply-chain
    - phone-home
failUnder: 70
ignore:
  - "tests/**"
  - "*.test.ts"

Scoring

Severity	Points
🔴 High	-25
🟡 Medium	-8
🟢 Low	-2

Score	Risk Level
90-100	✅ Low Risk — safe to install
70-89	🟡 Moderate — review warnings
40-69	🟠 High Risk — investigate before using
0-39	🔴 Critical — do not install

🗂️ Supported Platforms

Platform	Support
AI Agent Skills	OpenClaw, Codex, Claude Code
MCP Servers	Model Context Protocol tool servers
Dify Plugins	`.difypkg` archive extraction + scan
npm Packages	Any package with executable code
Python Projects	AST analysis + 35 security patterns
General	Any directory with JS/TS/Python/Go/Rust/Shell code

File Types

Language	Extensions
JavaScript/TypeScript	`.js`, `.ts`, `.mjs`, `.cjs`, `.tsx`, `.jsx`
Python	`.py` (regex + AST analysis)
Go	`.go`
Rust	`.rs`
Shell	`.sh`, `.bash`, `.zsh`
Config	`.json`, `.yaml`, `.yml`, `.toml`
Docs	`SKILL.md`, `manifest.yaml`

🤝 Contributing

We especially welcome:

New detection rules
False positive / false negative reports
Third-party benchmark test results

See CONTRIBUTING.md

Links

📦 npm · 📖 Rule Docs · 🤖 GitHub App · 💻 VS Code · 🔌 Integration Guide · 🇨🇳 中文 README

License

MIT

AgentShield