Close Menu
Versa AI hub
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

What's Hot

New in llama.cpp: Model Management

December 12, 2025

Deepening AI safety research in collaboration with the UK AI Security Institute (AISI)

December 12, 2025

US state attorneys general call for improved AI safety

December 11, 2025
Facebook X (Twitter) Instagram
Versa AI hubVersa AI hub
Saturday, December 13
Facebook X (Twitter) Instagram
Login
  • AI Ethics
  • AI Legislation
  • Business
  • Cybersecurity
  • Media and Entertainment
  • Content Creation
  • Art Generation
  • Research
  • Tools
  • Resources
Versa AI hub
Home»Tools»New in llama.cpp: Model Management
Tools

New in llama.cpp: Model Management

versatileaiBy versatileaiDecember 12, 2025No Comments3 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Reddit Telegram Email
#image_title
Share
Facebook Twitter LinkedIn Pinterest Email



victor master avatar

The llama.cpp server now comes with router mode, allowing you to dynamically load, unload, and switch between multiple models without restarting.

Note: The llama.cpp server is a lightweight OpenAI-compatible HTTP server for running LLM locally.

This feature was a popular request to bring Ollama-style model management to llama.cpp. We use a multi-process architecture where each model runs in its own process, so if one model crashes, other models are not affected.

quick start

Starts the server in router mode without specifying a model.

llama server

This will auto-detect the model from the llama.cpp cache (LLAMA_CACHE or ~/.cache/llama.cpp). If you have previously downloaded models via llama-server -hf user/model, they are automatically available.

You can also specify a local directory for the GGUF file.

llama server –models-dir ./my-models

Features

Auto-detection: scans the llama.cpp cache (default) or a custom –models-dir folder for GGUF files On-demand loading: models are automatically loaded the first time they are requested LRU eviction: pressing –models-max (default: 4) unloads the most recently used model Request routing: the model field in the request determines which model handles it

example

Chat with specific models

curl http://localhost:8080/v1/chat/completions \ -H “Content type: application/json” \ -d ‘{
“Model”: “ggml-org/gemma-3-4b-it-GGUF:Q4_K_M”,
“Message”: ({“Role”: “User”, “Content”: “Hello!”})
}’

On the first request, the server automatically loads the model into memory (load time depends on the size of the model). Subsequent requests for the same model will be instantaneous since it is already loaded.

List available models

curl http://localhost:8080/models

Returns all discovered models along with their status (loaded, loading, or unloaded).

Load the model manually

curl -X POST http://localhost:8080/models/load \ -H “Content type: application/json” \ -d ‘{“model”: “my-model.gguf”}’

Unload the model to free up VRAM

curl -X POST http://localhost:8080/models/unload \ -H “Content type: application/json” \ -d ‘{“model”: “my-model.gguf”}’

Main options

Flag Description –models-dir PATH Directory containing GGUF files –models-max N Maximum number of simultaneously loaded models (default: 4) –no-models-autoload Disable automatic loading. Requires explicit /models/load call

All model instances inherit settings from the router.

llama server –model-directory ./model -c 8192 -ngl 99

All loaded models use 8192 contexts and full GPU offload. You can also define per-model settings using presets.

llama server –model-preset config.ini

(my model)
model = /path/to/model.gguf
ctx size = 65536
temperature = 0.7

Also available in web UI

The built-in web UI also supports model switching. Just select your model from the dropdown and it will load automatically.

join the conversation

We hope this feature makes it easy to A/B test different model versions, perform multi-tenant deployments, or simply switch models during development without restarting the server.

Have questions or feedback? Drop a comment below or open an issue on GitHub.

author avatar
versatileai
See Full Bio
Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleDeepening AI safety research in collaboration with the UK AI Security Institute (AISI)
versatileai

Related Posts

Tools

Deepening AI safety research in collaboration with the UK AI Security Institute (AISI)

December 12, 2025
Tools

Microsoft Prompts fixes an issue where AI prompts could not be delivered

December 11, 2025
Tools

Aprilel-1.6-15b-Thinker: Cost-effective frontier multimodal performance

December 11, 2025
Add A Comment

Comments are closed.

Top Posts

New image verification feature added to Gemini app

December 7, 20257 Views

Microsoft Prompts fixes an issue where AI prompts could not be delivered

December 11, 20255 Views

Opposes federal moratorium on state-level AI regulations

December 11, 20255 Views
Stay In Touch
  • YouTube
  • TikTok
  • Twitter
  • Instagram
  • Threads
Latest Reviews

Subscribe to Updates

Subscribe to our newsletter and stay updated with the latest news and exclusive offers.

Most Popular

New image verification feature added to Gemini app

December 7, 20257 Views

Microsoft Prompts fixes an issue where AI prompts could not be delivered

December 11, 20255 Views

Opposes federal moratorium on state-level AI regulations

December 11, 20255 Views
Don't Miss

New in llama.cpp: Model Management

December 12, 2025

Deepening AI safety research in collaboration with the UK AI Security Institute (AISI)

December 12, 2025

US state attorneys general call for improved AI safety

December 11, 2025
Service Area
X (Twitter) Instagram YouTube TikTok Threads RSS
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms and Conditions
  • Disclaimer
© 2025 Versa AI Hub. All Rights Reserved.

Type above and press Enter to search. Press Esc to cancel.

Sign In or Register

Welcome Back!

Login to your account below.

Lost password?