MCP 101 Example

MCP 101

Calling a tool

Make sure that nothing is listening on ports 8000 and 8080. Open 3 generously sized terminals on your screen.
Download a sensible model. Qwen 3.5 4B is sensible.

Compile fresh llama.cpp:

git clone https://github.com/ggml-org/llama.cpp && cd llama.cpp
cmake -B build && cmake --build build --config Release -j 6

Launch the llama in terminal #1:

./llama-server -m ~/Downloads/Qwen3.5-4B-Q8_0.gguf --ctx-size 4096 --temp 1.0 --top-p 0.95 --top-k 20 --min-p 0.00 --verbose --webui-mcp-proxy

Clone this repository:

https://github.com/behavioral-ds/mcp-example && cd mcp-example

Install deps: poetry install && poetry shell
Launch MCP in terminal #2: python mcp_serve.py
Execute the Agentic Call™ in terminal #3: python call.py
Observe the dance between LLM <-> Inference engine <-> MCP <-> Client.

Using MCP prompts

Open llama web UI at http://localhost:8080/, go to settings and add a new MCP server:
Select "MCP prompt" when drafting a new message:
That's your @mcp.prompt() parsed into UI element, click it:
...and supply some meaningful content:
Then click "Use prompt" and rejoice: