2025-07-10 · HuggingFace

Building the Hugging Face MCP Server

protocols

read at source ↗ huggingface.co

Building the Hugging Face MCP Server

Source: HuggingFace Date: 2025-07-10 URL: https://huggingface.co/blog/building-hf-mcp

Summary

Technical architecture post: engineering deep-dive on how HuggingFace built hf.co/mcp, their official remote MCP server. Key decision: Streamable HTTP in stateless, direct-response mode — chosen over SSE (deprecated) and stateful architectures. Authentication uses HF_TOKEN/OAuth per request with no server-side session state. Operational data from the first week: 164 distinct MCP clients connected; 100:1 ratio of control messages to actual tool calls; ~50% of clients use mcp-remote as a bridge.

Implications

HF as open-source ML hub. HF operating an official MCP server positions the Hub as a first-class tool source for Claude, Cursor, and other MCP-enabled AI systems. The 164-client, 100:1 control-to-tool-call ratio data is an early signal that MCP adoption is broad but shallow — many clients connect, few actually invoke tools heavily. This matters for sizing infrastructure.

Transformers library trajectory — agent/tool thread. The architectural choices (stateless, Streamable HTTP, direct response) documented here are a reference implementation for the MCP ecosystem. HF publishing their production decisions openly is the kind of pattern-setting that shapes how other MCP servers are built, particularly in the open-source ML community.

← all signals