try the model
Overworld Stream: https://overworld.stream
What is Waypoint-1?
Waypoint-1 is Overworld’s real-time interactive video dissemination model that can be controlled and directed via text, mouse, and keyboard. You can give the model a few frames and let it run and create a world that you can step into and interact with.
The backbone of the model is a frame causal correction flow transformer trained on 10,000 hours of diverse video game footage combined with control inputs and text captions. Waypoint-1 is a latent model. That is, it is trained on compressed frames.
The existing world model standard is to take a pre-trained video model and fine-tune it with concise and simplified control inputs. In contrast, Waypoint-1 was trained from the beginning with a focus on interactive experiences. On other models, the controls are simple. You can move and rotate the camera once every few frames, but you will experience significant latency issues. Waypoint-1 is completely unrestricted when it comes to controls. You can freely move the camera using your mouse and type any key on your keyboard, all with zero lag. Each frame is generated using the control as a context. Additionally, this model runs fast enough to provide a seamless experience even on consumer hardware.
How were you trained?
Waypoint-1 was pre-trained with diffusion forcing, a technique in which a model learns how to denoise future frames given past frames. A causal attention mask is applied such that a token in any frame can only attend to tokens in its own frame or past frames, but not to future frames. Each frame is randomly added with noise, so the model learns to remove noise from each frame independently. During inference, you can now generate a procedural stream of new frames by denoising them one at a time.
Diffusive forcing provides a strong baseline, but random noise in every frame causes misalignment with the frame-by-frame autoregressive rollout. This inference mismatch causes errors to accumulate and long, noisy rollouts. To address this problem, we use self-enforcement for post-training. This is a technique for training models to produce realistic outputs under regimes that match inference behavior. Self-forcing with DMD has the additional advantage of one-pass CFG and several steps of denoising.
Inference library: WorldEngine
WorldEngine is Overworld’s high-performance inference library for interactive world model streaming. It provides core tools for building inference applications in pure Python and is optimized for low latency, high throughput, scalability, and developer simplicity. Runtime loops are designed with interactivity in mind. Consumes context frame images, keyboard/mouse input, and text and outputs image frames for real-time streaming.
With Waypoint‑1‑Small (2.3B) running on a 5090, WorldEngine sustains up to 30,000 token passes/second (single denoising pass, 256 tokens per frame), achieving 30 FPS in 4 steps or 60 FPS in 2 steps.
Performance is achieved through four targeted optimizations:
Caching AdaLN features: Avoid repeating AdaLN conditioning predictions through caching and reuse, as long as the prompt conditioning and timesteps are the same between forward passes. Static Rolling KV Cache + Flex Attention Matmul Fusion: Optimizing Standard Inference with Fused QKV Projections. Torch compile using torch.compile(fullgraph=True, mode=”max-autotune”, dynamic=False).
from world engine import WorldEngine, CtrlInput Engine = WorldEngine(“Overworld/Waypoint-1-Small”device =“Cuda”) Engine.set_prompt(“A game where you raise goats in a beautiful valley”) img = pipeline.append_frame(uint8_img)
for controller input in ( CtrlInput(button={48, 42}, mouse =(0.4, 0.3)), CtrlInput(mouse=(0.1, 0.2)), CtrlInput(button={95, 32, 105}), ): img = Engine.gen_frame(ctrl=controller_input)
Build with World Engine
The world_engine hackathon will be held on January 20, 2026 – you can register your interest here. Teams of 2-4 people are welcome and the prize is a 5090 GPU on the spot. We’d love to see what you come up with to extend world_engine. It’s also a great event to meet like-minded founders, engineers, hackers, and investors. Join us on January 20th at 10am PT for 8 hours of friendly competition.
keep in touch

