Anthropic is restricting access to Claude models on its open agent platform for Pro/Max subscribers. But don’t worry. Hugging Face has a great open model for keeping agents up and running. In most cases, it costs a fraction of the cost.
If you are disconnected and need to resuscitate your OpenClaw, Pi, or Open Code agent, you can move the agent to the open model in two ways:
We use an open model provided through the hug face inference provider. Run a fully local open model on your own hardware.
A hosted route is the fastest way to get back to a competent agent. If you want privacy, zero API costs, and complete control, the local route is the way to go.
To do this, tell Claude Code, Cursor, or your favorite agent, “Help me move my OpenClaw agent to the hug face model” and link this page.
hug face inference provider
The Hugging Face inference provider is an open platform routing provider for open source models. If you want the best model or don’t have the necessary hardware, this is the right choice.
First, we need to create a token here. Then you can add that token to openclaw like this:
openclaw onboard –auth-choice huggingface-API-key
When prompted, paste your hug face token and you’ll be asked to select a model.
We recommend GLM-5 because of its excellent Terminal Bench score, but there are thousands to choose from here.
You can update the hug face model at any time by entering the repo_id in the OpenClaw settings.
{ agent: { default: { model: { primary: “huggingface/zai-org/GLM-5:fastest” } } } }
Note: HF PRO subscribers get a free $2 monthly credit towards their Inference Provider usage. Click here for more information.
local setup
Running your model locally gives you complete privacy, zero API costs, and the ability to experiment without rate limits.
Install Llama.cpp, a completely open source library for low-resource inference.
For Mac or Linux
brew install llama.cpp
above the window
Install llama.cpp with winget
Start a local server using the built-in web UI.
Llama server -hf unsloth/Qwen3.5-35B-A3B-GGUF:UD-Q4_K_XL
Here I am using Qwen3.5-35B-A3B which works well with 32 GB of RAM. If you have different requirements, please check the hardware compatibility for the model you are interested in. There are thousands of choices.
If you want to load GGUF into llama.cpp, use an OpenClaw configuration like this:
openclaw onboard –non-interactive \ –auth-choicecustom-api-key \ –custom-base-url “http://127.0.0.1:8080/v1” \ –custom-model-id “unsloth-qwen3.5-35b-a3b-gguf” \ –custom-api-key “llama.cpp” \ –secret-input-mode plaintext \ –custom-compatibility openai
Verify that the server is running and the model is loaded.
curl http://127.0.0.1:8080/v1/models
Which path should I choose?
If you want the fastest path back to a capable OpenClaw agent, use the Hugging Face Inference provider. If you want privacy, full local control, and no API charges, use llama.cpp.
In any case, a closed host model is not needed to get OpenClaw back on its feet.

