Midjourney deployed the first version of its AI-powered video generation model, allowing users to create short animation clips from images on the platform. The tool is available via the web and Midjourney Discord server, but currently requires a paid subscription.
This early version allows users to generate 5-second clips from images they create or upload to the platform. After generating the image, the user will be presented with an “Animation” button, with a button that guides the prompt-based animation process. By default, the system uses a general prompt to add motions, but manual options allow custom descriptions of movements. Users can also enter a starting image to guide the animation.
Midjourney allows users to expand the animation up to four times in 4 seconds, resulting in videos lasting up to 21 seconds. The platform offers both advanced and low-motion modes, allowing users to control whether the subject, camera, or both are animated.
The price structure is tied to GPU time, with subscriptions starting at $10 per month with a 3.3-hour high-speed GPU usage. For videos, Midjourney is converted to a second cost per second, estimating it would cost about eight times more than generating a single image.
“This is just a stepping stone,” wrote David Holtz, founder of Mid Journey, in a post announcing the feature. He added that the company is aiming for a more sophisticated model that could enable real-time open-world simulations in the future.
This release arises amid legal tensions. Midjourney is currently facing lawsuits from Disney and Universal, and has expressed concern about the company’s video ambitions. In the lawsuit, Midjourney points out video generators in particular as a “virtual vending machine” for fraudulent reproductions of copyrighted works. The studio claims that training models is likely to infringe intellectual property.
Midjourney is in the AI Video Generation Space and is participating in the growth list of high-tech companies including Openai, Google and Meta. Each has introduced a tool that converts text prompts to video. The competition is accelerating to build next-generation content creation tools.