Two years ago we released Swift-Transformers (!) our goal was to support Apple developers and help integrate local LLM into our apps. A lot has changed since then (MLX and chat templates didn’t exist!), and I learned how the community actually uses the library.
We want to double the use cases that benefit our community most and lay out the foundations for the future. Spoiler Alerts: After this release, we will focus a lot on MLX and agent use cases 🚀
What is a quick conversion?
Swift-Transformers is a Swift library intended to reduce friction for developers who want to operate local models on Apple silicon platforms, including iPhones. This includes missing pieces that Core ML or MLX only provide, but are necessary to work with local inference. In other words, it provides the following components:
Tokenzer. Preparing for inputting a language model is surprisingly complicated. Tokenizers has built a lot of experience with Python and Rust libraries. This is the foundation of the AI ecosystem. We wanted to bring the same performance ergonomic experience to Swift. A quick version of Tokensor needs to handle everything, including chat templates and agent usage! Hub. This is an interface to a hugging facehub that is available to all open models. Models can be downloaded from the hub and cached locally, supporting background risk-enabled downloads, model updates and offline modes. It includes a subset of the features provided by Python and JavaScript libraries, focusing on the tasks Apple developers need most (i.e. uploads are not supported). Model and generation. These are LLMS wrappers converted to core ML format. Converting them is out of the scope of the library (but there are a few guides). Once they are converted, these modules can easily infer with them.
How does the community use it?
Most often people use tokensor or hub modules. Some notable projects that rely on Swift-Transformers include:
MLX-Swift-Examples, Apple. In fact, it’s not just a collection of examples, but a list of libraries that can be used to run different types of models using MLX, including LLMS and VLMS (Vision-Language models). It’s a kind of model and generation library, but in the case of MLX instead of core ML. It also supports more model types, such as embedding and stable diffusion. By whisperkit, argmax. It is an open source ASR (voice recognition) framework that is highly optimized for Apple Silicon. It relies on hubs and tokensor modules. Many other app demos, including FastVlm by Apple and our own Smolvlm2 native app.
What changes in v1.0
Version 1.0 shows package stability. Developers are building apps about Swift-Transformers, and this first major release recognizes these use cases and brings version numbers along with their reality. It also provides the foundation for iterating through communities to build the next set of features: These are some of our preferred updates.
Tokenners and Hubs are now the first citizen, top-level modules. Before 1.0, you had to rely on the full package to import, but you could only choose Tokensor, for example. Speaking of Jinja, we are proud to be working with John Mai (X) to announce that we have created the next version of his outstanding Swift Jinja Library. John’s work was extremely important to the community. He took on the task on his own to provide a solid chat template library that can grow as templates become more and more complicated. The new version is a few orders of magnitude faster (no joke) and lives here as Swift-Jinja. To further reduce the load placed on downstream users, we removed the example CLI target and the Swift-Argument-Parser dependency. Thanks to Apple’s contribution, we adopted the Modern Core ML API, supporting the stateful model (to facilitate KV caching) and the expressive MLTENSOR API. This removes thousands of custom tensor operations and mathematical code. Many additional orals are removed, reducing the API surface, reducing cognitive load and repeating faster. Testing is better, faster, stronger. Document comments have been added to the public API. Swift 6 is fully supported.
Version 1.0 comes with a broken API change. However, if you are a user of Tokensor or Hub, you don’t expect any major issues. If you are using the core ML components of the library, please contact us to support you during the migration. Prepare and add the migration guide to the documentation.
Examples of use
Here’s how to format a tool that invokes input in LLM using Tokenizers:
Import Tokensor
Let me Tokensor = try wait Auto Token Iser. “mlx-community/qwen2.5-7b-instruct-4bit”))
Let me weathertool = (
“type”: “function”,
“function”🙁
“name”: “get_current_weather”,
“explanation”: “Get current weather at a specific location”,
“parameter”🙁
“type”: “object”,
Properties🙁“position”🙁“type”: “string”, “explanation”: “Cities and states”)),
“Required”🙁“position”)))
Let me token = try tokenizer.applychattemplate(Message: ((())“role”: “user”, “content”: “What’s the weather in Paris?”), Tool: (Weathertool))
Check this section in the README and Examples folder for additional examples.
What’s coming next
Honestly, we don’t know. We know we are very interested in exploring MLX. Because it is usually the go-to approach of developers starting ML with native apps, and they want to help them experience as seamlessly as possible. Along the line of better integration with LLMS and VLMS with MLX-Swift-Examples, we consider potentially through pre- and post-processing operations that developers often encounter.
I’m also very excited about the use of agents in general, especially MCP. I think it will expose system resources to local workflows.
If you would like to follow us on this journey or share your ideas, please contact us via our social network or repository.
I couldn’t do this without you
We are extremely grateful to all the contributors and users of the library for your help and feedback. We love you all and can’t wait to continue working with you. ❤❤️