MCP Performance Optimization Guide
Make your MCP servers fast with caching, connection pooling, and async patterns.
Every millisecond of MCP latency adds to AI response time. Users waiting 10+ seconds for a response will abandon your tool. This guide covers the patterns that make MCP servers fast.
Why Performance Matters
Slow MCP servers create a poor user experience. When an AI assistant calls your tool and waits 5 seconds for a response, the entire conversation feels sluggish. Fast servers = better UX = more usage.
Async Everything
MCP is inherently async. Don't block the event loop:
# BAD - blocks the event loop
@server.tool()
def slow_tool():
result = requests.get("https://api.example.com") # Blocking!
return result.json()
# GOOD - non-blocking
@server.tool()
async def fast_tool():
async with aiohttp.ClientSession() as session:
async with session.get("https://api.example.com") as response:
return await response.json()Connection Pooling
Create connections once, reuse them:
import aiohttp
import asyncpg
class MCPServer:
def __init__(self):
self.session = None
self.db_pool = None
async def startup(self):
# HTTP connection pool
self.session = aiohttp.ClientSession(
connector=aiohttp.TCPConnector(limit=100)
)
# Database connection pool
self.db_pool = await asyncpg.create_pool(
DATABASE_URL,
min_size=5,
max_size=20
)
async def shutdown(self):
await self.session.close()
await self.db_pool.close()Caching Strategies
In-Memory Cache
For frequently accessed, rarely changing data:
from functools import lru_cache
from cachetools import TTLCache
# Simple LRU cache
@lru_cache(maxsize=1000)
def get_config(key):
return load_from_database(key)
# Async TTL cache
cache = TTLCache(maxsize=1000, ttl=300) # 5 minute TTL
async def cached_fetch(url):
if url in cache:
return cache[url]
result = await fetch(url)
cache[url] = result
return resultRedis for Distributed Caching
When running multiple MCP server instances:
import redis.asyncio as redis
class CachedMCPServer:
def __init__(self):
self.redis = redis.Redis(host='localhost', port=6379)
async def get_cached(self, key, fetch_func, ttl=300):
# Try cache first
cached = await self.redis.get(key)
if cached:
return json.loads(cached)
# Cache miss - fetch and store
result = await fetch_func()
await self.redis.setex(key, ttl, json.dumps(result))
return resultBatch Operations
Combine multiple requests into one:
@server.tool()
async def get_users_batch(user_ids: list[str]):
# BAD: N database queries
# users = [await db.get_user(id) for id in user_ids]
# GOOD: 1 database query
users = await db.get_users_where_id_in(user_ids)
return usersStreaming Responses
For large results, stream instead of buffering:
@server.tool()
async def stream_large_file(path: str):
async def generate():
async with aiofiles.open(path, 'r') as f:
async for line in f:
yield line
return StreamingResponse(generate())Timeout Handling
Don't let slow operations hang forever:
import asyncio
@server.tool()
async def fetch_with_timeout(url: str):
try:
return await asyncio.wait_for(
fetch(url),
timeout=5.0 # 5 second timeout
)
except asyncio.TimeoutError:
return {"error": "Request timed out"}Benchmarking
Measure before optimizing:
import time
import statistics
async def benchmark_tool(tool_func, iterations=100):
times = []
for _ in range(iterations):
start = time.perf_counter()
await tool_func()
times.append(time.perf_counter() - start)
return {
"mean": statistics.mean(times) * 1000, # ms
"median": statistics.median(times) * 1000,
"p95": sorted(times)[int(iterations * 0.95)] * 1000,
"min": min(times) * 1000,
"max": max(times) * 1000,
}Target Metrics
Performance Checklist
- ☐ All I/O operations are async
- ☐ Connection pools for HTTP and database
- ☐ Caching for repeated queries (TTL appropriate)
- ☐ Batch operations where possible
- ☐ Timeouts on all external calls
- ☐ Streaming for large responses
- ☐ Profiled and benchmarked critical paths
Next Steps
- → Testing MCP Servers — Load testing approaches
- → MCP Error Handling Patterns — Graceful degradation
- → MCP Guide Home — All tutorials
Get updates in your inbox
Tutorials, updates, and best practices for Model Context Protocol.
No spam. Unsubscribe anytime.
Written by Kai Gritun. Building tools for AI developers.