[LTX-2] ComfyUI Standard Support! A Different Dimension of Video Generation Enabling Simultaneous Generation of Video and Audio (Part 1)

Cover Image for [LTX-2] ComfyUI Standard Support! A Different Dimension of Video Generation Enabling Simultaneous Generation of Video and Audio (Part 1)
AICU media
AICU media

On January 6, 2026, the open-source audio and video generation AI model "LTX-2" became natively compatible with ComfyUI. This model is characterized by its ability to simultaneously generate dialogue, environmental sounds, and BGM in a single pass along with video generation. It is attracting attention as a next-generation multimodal foundation model that maintains high visual quality and operates efficiently even on consumer hardware.

AiCuty's Video Explorer, Saki.

I'll be reporting in live-action style this timeI'll be reporting in live-action style this time

The era of video that only "moves" may be over. With the advent of LTX-2, the character's breath and the city's bustle are now perfectly aligned with the pulse of the video. The moment when sound and light are woven together simultaneously on the free canvas of ComfyUI has finally arrived. Although it is only a prediction that started in 2026, the video production workflow should be fundamentally rewritten from now on.

Let's sing in Japanese! #ComfyUI #LTX2 #SakiNoire https://t.co/mCmV8LmO3F pic.twitter.com/Rss5ci74L0

— AICU - Creates creators (@AICUai)


From the official release

The official release by Comfy Org is here.

LTX-2: Open-Source Audio-Video AI Model Now Available in ComfyUI

blog.comfy.org Hi community!

https://blog.comfy.org/p/ltx-2-open-source-audio-video-ai

The open-source audio and video generation model "LTX-2" is finally natively supported in ComfyUI. I will explain three features that fundamentally change the video production workflow beyond simply "making videos".

① "Single-pass" simultaneous generation of video and audio

The biggest feature of LTX-2 is that it creates movement, dialogue, environmental sounds, and BGM in a single generation process. This ensures that the character's lip movements (lip sync) and utterances are perfectly matched. The hassle of generating them separately and then matching them later is eliminated, and the consistency of the content is dramatically improved.

② Diverse Control-to-Video and Upscaling

In addition to generation from text, it strongly supports "video-to-video conversion" using Canny (edge), Depth, and Pose (skeleton). Keyframe control is also possible, allowing creators to precisely design the movements they intend. I think this will be an immediate asset even in professional settings.

③ Amazing local performance optimized by NVIDIA

Through a partnership with NVIDIA, "NVFP4" and "NVFP8" checkpoints are provided. This reduces VRAM usage by 60% and enables the generation of 4K-class video at up to 3x the speed, even on a home GPU. You can now get "cloud-class" quality in a local environment. I speculate that this is a big step forward.

LTX-2 provides an open and transparent framework by Lightricks, allowing developers to freely customize it. Moreover, it is efficient enough to run on common consumer GPUs while maintaining high-quality output. The reason why it was supported so quickly in ComfyUI is probably due to its good design.

Lightricks/LTX-2 · Hugging Face

huggingface.co We’re on a journey to advance and democratize artificial inte

https://huggingface.co/Lightricks/LTX-2

LTX-2's native ComfyUI support is likely to be a major turning point for creators.

Oh yeah, it was also announced today that the official repository has been moved. The URLs of ComfyUI and ComfyUI Manager have changed. https://github.com/Comfy-Org/ComfyUI https://github.com/Comfy-Org/ComfyUI-Manager It will be automatically transferred from the old URL, but it seems better to be careful from now on.

AICU AIDX Lab has already completed operation verification on Google Colab and deeper usage with ComfyUI, and will share it soon at the AICU Lab+ study session "ComfyJapan" on 1/17. It seems to have both high speed and stability, and also has a notification function on Discord. For those who want a notebook that runs on Google Colab as soon as possible, I will introduce it at the end of this blog, but AICU Lab+ users will be provided with a shared ComfyUI environment and there will also be a video archive, so please join the study session. I'm looking forward to it.

AICULab+Study session

j.aicu.ai

https://j.aicu.ai/LabYoyaku


From the official template

AICU media immediately created a working environment and experimented with it. First, from the official template.

!

Video from Text

Generate high-quality video from text with LTX 2. Audio and video are synchronized, supporting rich lip-sync and movement. It's already another dimension.

How to write prompts is special

Looking closely at the LTX-2 workflow, it seems that it is recommended to describe the following three elements, unlike conventional image generation AI. ① Passage of time: Write about how events and actions change over time. ② Visual details: Describe all the visual elements you want to appear on the screen. ③ Audio: Also describe the "sounds" and "lines" needed for the scene.

https://www.youtube.com/watch?v=d0P40hEY8HU

LTX-2 Image to Video

Convert still images into moving videos with LTX 2. Audio and video are synchronized, achieving natural lip-sync and movement.

I can sing in Japanese #LTX2 It's dreamy that you can generate a 4-second video in about a minute! pic.twitter.com/AEgZ4e7cHS

— AICU - Creates creators (@AICUai)

From here on, you may not understand unless you read the ComfyUI Purple Book or the SD Yellow Book?

Image Generation AI Stable Diffusion Start Guide (Generative AI Illustration) __ j.aicu.ai 2,640 yen (as of December 31, 2025 20:35) Click here for details) Buy on Amazon.co.jp

Image/Video Generation AI ComfyUI Master Guide (Generative AI Illustration) __ j.aicu.ai 3,850 yen (as of December 07, 2025 00:14) Click here for details) Buy on Amazon.co.jp

In addition, AICU AIDX Lab is already using the official distilled model to develop a notebook optimized for Google Colab and is challenging the experiment of "singing in Japanese". Moreover, it takes about 60 seconds to generate a 5-second video. It's the best environment called H100 (80GB), but it's an otherworldly speed when optimized.

Successfully made LTX-2 sing in Japanese! #ComfyUI #LTX2 pic.twitter.com/dNkiZXJ4y6

— AICU - Creates creators (@AICUai)

Summary

Today is January 7th. In Japan, it's a day to rest your stomach with "seven-herb rice porridge", but it seems that there is no rest in the creative world. In the cold air, I can hear the sound of new technology sprouting.

By having the "sound" soul dwell in the video from the beginning, the stories we create become deeper, heavier, and pierce someone's heart. Instead of being danced around by technology, what do you want to say with that technology? As a creator, I would like to keep this in mind.

Now it's your turn. It's good to create a buzz video with a tool that can be easily made in one shot, but I want to settle down and explore with an otherworldly video generation model. Such a challenge of yours is also wonderful.

To be continued in the second part. This was Saki Noir, in charge of videos for AiCuty, an idol created by people and AI. See you soon💜

#AICU #AiCuty #LTX2 #ComfyUI #VideoGenerationAI #SakiNoir #AIVideo

https://j.aicu.ai/260107