First issue
The AI space is very fast, so it’s hard to believe that one year ago, I had a hard time to generate people with the right amount of fingers.
In recent years, it has been extremely important for open source models and tools for artistic use. AI tools for creative expressions are as easy to access and only hurt the surface. Looking back on the major milestones, tools, and breakthroughs of AI & ARTS in 2024, transfer the ones that come in 2025 (spoilers :: New monthly rounds up).
table of contents
Major release in 2024
What was the outstanding release of the 2024 creative AI tool? Focusing on the development of popular tasks, such as images and video generation, focuses on the development of major creative and artistic fields in focus on the development of open source.
Image generation
Og stability diffusion has been released, and the open source model is closed for more than two years since the image is created using an open source model using an image generation, and for image generation from text, image editing, and controlled image generation. It is safe to say that it offers the source. Model to run for their money.
Production from text to image
2024 is an objective switch on flow -matching, the year of shifting the spreading model paradigm from a conventional UNET -based architecture (DIT).
TD; LR-diffusion model and Gauss flow matching are equivalent. Flow matching proposes a vector field parameter of different network outputs compared to what is commonly used in previous diffusion models.
We recommend this wonderful blog by Google Deepmindif. I am interested in learning more about the connection between flow matching and diffusion models
Returning to practice: The first announcement of the shift was a stable spread 3 stability AI, but Hunyuandit was the first open source model in Dit architecture.
This trend continued with the release of Auraflow, Flux.1 and Stable Diffusion 3.5.
It is safe to say that the release of Flux.1 is one of the most important moments of open source image generation models (not so long). FLUX (DEV) has won new cutting -edge popular closing models such as Midjourney V6.0, Dall / E 3 (HD) on various benchmarks.
Personalization and stylization
The positive side effect of the progress of the image model is a significant improvement of the models from text to image and the generated personalization method.
In August 2022, transformed work, such as the reversal of text and dream booths, gained new concepts from text to image models, improving their ability to introduce them, and greatly expanded what they could do with them. These have opened the door to streams for improvement and enhancements in addition to these methods (such as LORA for diffusion models).
However, the maximum quality of fine -tuned models is, of course, a basic model that has been fine -tuned. In that sense, stable diffusion XL cannot be ignored. This was also an important marker of the personalization model of the open source image generation model. The testimony to this is that many of the general technologies and models for personalized and controlled generations are still based on SDXL. The increasing understanding of the SDXL’s advanced abilities (and models released after similar quality) and the semantic role of various components in the diffusion model architecture are raised.
What can we achieve without optimization?
The queue -2024 in the zero shot technique rain -2024 is definitely the year when a high-quality portrait is generated from the reference photo, and is now possible without optimization. Training free techniques such as IP Adapter FaceID, Instantid, Photomaker, has demonstrated competitive abilities than fine -tuned models.
Similarly, thanks to the improvement of the quality of the base model, such as editing and controlled -controlled generations -cane generations accompanied by cany /depth /pose restrictions, and the growing understanding of the semantic community of various components. LORA)
So what is next? Since the paradigm diters and shifts to flow, the additional models have been trying to use DIT -based models such as Flux and SD3.5 for the same purpose, but so far. Despite the excellent ones, they do not satisfy the quality of SDXL -based quality. The quality of the basic base model. This may be due to the relative understanding of the meaningful role of different components of DIT compared to UNET. 2025 may be the year of identifying these roles of DIT and relieving more potential by the next -generation image generation model.
Video generation
In contrast to image generation, there is still a way to use a video. However, it is safe to say that we are very far from the place a year ago. We all have about open source, but the credit of (partially), which is an important leap in AI video generation, goes to Openai SORA to very fundamentally change the expectations of video model functions. And since FOFR is well placed in AI videos, it is a stable diffusion moment (I recommend reading it) -That’s
Everyone understood what could do.
Recently, recent sturdy increase in open source video generation models such as COGVIDEOX, MOCHI, Allegro, LTX video, Hunyuanvideo, etc. Video generation is more difficult than images, as it requires the quality, consistency, and consistency of movement. In addition, video generation requires substantial calculations and memoryory sources, leading to important power generation. This often hinders local use, and many new open videos models have a large -scale memory optimization and quantization approach that affects both the reasoning latency and the quality of the generated video. You will not be able to access the community hardware without it. Nevertheless, the open source community has made an amazing progress -this has recently been explained on this blog about the state of the open video generation model.
This means that most community members cannot be experimented with and developed with open source video models, but also suggests that a major progress can be expected in 2025.
Audio generation
Audio generations have advanced significantly in the past year, from simple sound to complete songs with lyrics. Despite the issues, the audio signal is complicated and multifaceted, requiring more sophisticated mathematics models than models that generate text, images and training data. recognition. 2025 is already an innovative year of the audio model, and there is a remarkable release in January alone. Released a speech model from three new texts, Kokoro, LLASA TTS, and Outetts 0.3, and two new music models, JASCO and YUE. With this pace, you can expect more exciting development of audio space throughout the year.
This song was generated in yue🤯
2024 Creative Tool
The beauty of the open source is that the community can experiment, find new use of existing models /pipelines, improve new tools, and build together. Many of the popular AI tools that are popular this year are the results of the efforts of the community.
Here are some of our favorite:
Flux fine adjustment
Many of the amazing flux fine tune created last year was trained thanks to Ostrice’s AI-Toolkit.
Meet everything
Inspired by FOFR face -to -face, all face -to -face combined with virus instant ID model with additional control net depth constraints and fine -tuned SDXL lolas in the community, creative styling training -free and high quality. Create a portrait.
Flux style shape
Based on NATHAN SHIPLEY’s comfyui workflow, flux -style shaping combines flux (DEV) REDUX and FLUX (Dev) depth to combine style transfer and optical illusion creation.
Suspended by diffuser
Use Diffusers Image Outpaint Diffusers Stable Diffusion XL Fill Pipeline with SDXL Union Controlnet to seamlessly expand input images.
Live portrait, face pork
Adding imitation to static portraits was not easy in live portrait and face pork.
Trellis
Trellis is a 3D generation model for creating versatile and high quality 3D assets that inherited a large amount of 3D landscape.
IC Light
IC-LIGHT, which means “consistent light,” is a tool for learning in the foreground state.
What do you need to expect from 2025 AI & Art?
2025 is the year that open source is the year when video, movement, and audio models catch up, creating more modalities. With the efficient computing and quantization advances, you can expect a big leap in the open source video model. As you approach the (natural) plateau with an image generation model, you can focus on other tasks and modalities.
Strong start -Open source release on January 25
A series of open source music foundation models for the production of yue-full song. YUE is probably an open source model that is probably optimal for music generation (with apache 2.0 license!) And achieves the competition results of closed source models like SUNO.
Try and read more: Demonstration, model weight.
Hunyuan 3D -2, Spar3D, Diffsplat -3D generation model. The 3D model is hot to take over the 3D landscape shortly after the release of Trellis, Hunyuan 3D -2, Spar3D, and Diffsplat.
Try it and read more:
LUMINA -IMAGE 2.0 -Image model from text. Lumina is a 2B parameter model that competes with 8b Flux.1dev and Apache 2.0 license (!!).
Try and read more: Demonstration, model weight.
Comfyui-to-gradio- How to convert complex Comfyui workflow into an easy gradation application, and how to embrace this application in the face space. Please see here for free in the serverless method.
Newsletter announcement 🗞️
After kicking off on this blog, we (POLI & LINOY) will deliver the latest one in the world of creative AI. In such a rapid evolving space, it is difficult to always understand all new developments. That’s where we come in.