MCP 101
Calling a tool
- Make sure that nothing is listening on ports
8000and8080. Open 3 generously sized terminals on your screen. - Download a sensible model. Qwen 3.5 4B is sensible.
- Compile fresh
llama.cpp:git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp cmake -B build && cmake --build build --config Release -j 6 - Launch the llama in terminal #1:
./llama-server -m ~/Downloads/Qwen3.5-4B-Q8_0.gguf --ctx-size 4096 --temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.00 --verbose --webui-mcp-proxy - Clone this repository:
https://github.com/behavioral-ds/mcp-example && cd mcp-example - Install deps:
poetry install && poetry shell - Launch MCP in terminal #2:
python mcp_serve.py - Execute the Agentic Call™ in terminal #3:
python call.py - Observe the dance between
LLM <-> Inference engine <-> MCP <-> Client.
Using MCP prompts
-
Open llama web UI at http://localhost:8080/, go to settings and add a new MCP server:
-
Select "MCP prompt" when drafting a new message:
-
That's your
@mcp.prompt()parsed into UI element, click it:
-
...and supply some meaningful content:
-
Then click "Use prompt" and rejoice: