dottxt and Hugging Face are pleased to announce that they are collaborating on Outline Core, a Rust port of Outline’s core algorithm for structured generation. In addition to being able to use Outline to get reliable output from LLM, this Rust port offers several additional benefits to Outline users. Speed: Users can expect index compilation to be 2x faster. Separation of concerns: It’s now easier to incorporate structured generation into other libraries. The Outline Core is extremely lightweight. Portability: Rust’s core algorithms allow binding to languages other than Python.
These improvements not only improve performance for existing outline users, but also significantly increase the ways users can incorporate structured generation into their LLM workflows. outlines-core has been published and integrated into outlines, and version 0.1.0 of the Python bindings has been released. The repository can be found here.
A quick introduction to structured generation 🧑🎓
structure
Structured generation means that the LLM is guaranteed to follow the desired format. This could be JSON, Pydantic models, regular expressions, or context-free grammars. Importantly, structured generation prohibits the generation of “wrong” tokens.
Let’s look at a very simple example. The LLM must generate a boolean value, ‘true’ or ‘false’. And nothing more. For illustration purposes, assume that LLM generates characters instead of tokens. So the first character is ” , so we just need to skip the forward pass. For the second character, we don’t need to sample from all possible characters. LLM just chooses between t or f That’s all you need.
Then, no matter which path you choose, there will be only one next valid character. If LLM chooses t as the first letter, it must be followed by r, u, and e. Similarly, if you select f, it will be followed by a, l, s, and e. And the last ” will be chosen as the last character, regardless of the path. Of course there is more functionality under the hood. For a more detailed explanation, I recommend this dottxt blog and the related paper on arxiv Masu.
Why is it important?
It may not be immediately obvious how amazing structured generation is. The first use case that many people think of is, “Great, now that LLM can return valid JSON, we can now treat it as an API and reliably serialize/deserialize the JSON.” It’s called “Ta”. But that’s just scratching the surface. If you think about it, structure is everywhere, even in completely unexpected places, like in the GSM8K benchmark.
These are just a few examples of what structured generation can do.
And, perhaps more surprisingly, the ratings become less sensitive to the specific prompts and number of shots used. Besides the amazing tricks the construction offers, it also improves performance. The dottxt blog has many great articles with performance benchmarks.
Why rewrite in Rust? 🦀
speed
When you hear “rewriting in Rust,” the first thing that comes to mind is probably performance. And yes, this also applies to outline cores. Although some important parts have not yet been migrated to Rust, we are already seeing an average 2x improvement in compilation speed.
Prior to porting to Rust, Outlines used Numba to accelerate index building. Although Numba is fast (runtime performance is comparable to Rust), the JIT compilation of Numba functions adds an additional source of delay during initial execution, which has been a source of frustration for many users. Rust allows you to precompile index building functions so there is no delay when they are first executed. This wasn’t important in production (as the first run can be done as part of the deployment), but it could make a big difference in the experimental phase.
Safety and reliability
One of the main reasons for rewriting outlines in Rust is the emphasis on safety and reliability that Rust provides. Rust’s strong static typing combined with Rust’s ownership model eliminates an entire class of bugs such as null pointer dereferences and data races in concurrent code. This results in more robust and secure software.
Safety is very important when it comes to outlines. Structured generation often involves complex data structures and operations, especially when dealing with high-performance inference engines. By leveraging Rust’s safety guarantees, you reduce the risk of run-time errors and undefined behavior that can result from memory mismanagement.
Additionally, Rust’s compile-time checks encourage developers to write cleaner, more maintainable code. This improves the current codebase and makes future development more efficient. New contributors can be onboarded faster, and code can be easily audited and verified for correctness.
separation of concerns
Outlines are designed to do more than provide core algorithms for structured generation. Among other things, the library is packed with many dependencies, as it includes integrations to other libraries such as transformers. Separating the core algorithms from the outline library means that other libraries that want to include structured generation can do so by importing a very lightweight library. Therefore, you can imagine libraries like transformers and llama-cpp-python directly integrating structured generation in the near future. This allows the dotxt team to focus on the core algorithms.
portability
Most of the LLM training is written in Python, but the inference is a little different. It runs on different devices and specialized servers and is written in different programming languages. Therefore, portability is also important in structured generation. By writing the core functionality of Outline in Rust, we can now create bindings to other languages.
For example, this port allows for smoother integration into text generation inference. TGI’s server logic is written in Rust, and we want to avoid having to call Python code as much as possible. This also means that libraries like mistral.rs and models implemented using candle can benefit from Outlines’ performance and functionality.
In the future, we will consider binding to JS/TS so that we can use outlines with transformers-js. Alternatively, potentially Swift bindings will allow you to use outlines natively on Apple devices. But for now, the focus is on the Python bindings and continuing to round out Outline’s core feature set by extending support for the JSON Schema specification.
contribute
Do you like using structured generation, parsers, and making sure LLM only outputs valid JSON? Star the library, tweet, and join in and contribute! Share your work on Twitter Please share with the community on dottxt and Hugging Face.