LTX-2.3: Open AI Video Generator with Synced Audio & 4K Output

LTX-2.3 is an open-source AI video generation model that creates 4K videos with synchronized audio in a single generation process. It’s an open alternative to OpenAI’s Sora 2 and Google’s Veo 3.

Released by Lightricks, this 20.9-billion-parameter diffusion transformer produces high-quality videos at up to 50 fps with coherent sound, dialogue, and music, all from text prompts, images, or audio inputs.

LTX-2.3 is the latest release of the LTX-2 model family and brings four engine-level improvements over the previous version: a rebuilt VAE for sharper fine detail, a 4× larger text connector for tighter prompt adherence, native portrait video support at 1080×1920, and significantly cleaner audio output.

Official Website

Try it Online (LTX-2.3 Distilled)

What sets LTX-2.3 apart from proprietary alternatives is speed and cost. The model generates a six-second Full HD video in five seconds on consumer-grade GPUs, where competing models typically take one to two minutes for comparable output.

The model runs on high-end consumer hardware (NVIDIA RTX 4090 or 5090) and costs up to 50% less than closed alternatives. Model weights, training code, ComfyUI nodes, and reference workflows are all publicly available on GitHub and HuggingFace.

LTX 2.3

Features

Synchronized Audio and Video Generation: Creates visuals and sound together in one process, so dialogue, ambience, and music align naturally with on-screen motion.

Rebuilt VAE for Sharper Fine Detail: The redesigned latent space produces cleaner textures, hair detail, edge definition, and on-screen text across the full generation pipeline.

Tighter Prompt Adherence: A 4× larger gated attention text connector accurately resolves complex prompts covering multiple subjects, spatial relationships, and detailed stylistic instructions.

Native Portrait Video Support: Generates vertical video at up to 1080×1920, trained on portrait-orientation data with no cropping from landscape source material.

Cleaner Audio Output: Filtered training data and a new vocoder reduce noise artifacts, silence gaps, and audio drift across text-to-video and audio-conditioned workflows.

Stronger Image-to-Video Generation: Produces better visual consistency from the input frame, less freezing, and more natural motion compared to LTX-2.

Multiple Performance Modes: Three modes balance speed and fidelity: Fast mode for live previews and rapid iteration, Pro mode for balanced quality and turnaround, and Ultra mode for maximum 4K fidelity at 50 fps with synchronized audio (up to 15 seconds).

Native 4K Resolution: Outputs up to 4K at 48–50 fps with sharp textures and smooth motion, supporting both 16:9 and portrait aspect ratios.

Extended Video Length: Generates clips up to 10–15 seconds with synchronized audio, depending on the mode, and supports video extension and keyframe-based generation for longer sequences.

Multi-Keyframe Conditioning: Controls scene pacing, motion, and tone through multiple keyframes, with 3D camera logic for precise frame-level control.

Multimodal Inputs: Accepts text prompts, images, video references, depth maps, pose estimation, and audio inputs for detailed creative guidance.

LoRA Fine-Tuning: Customizes the model for brand-specific styles or character consistency through Low-Rank Adaptation. Note: LoRAs trained on LTX-2 require retraining for the updated LTX-2.3 latent space.

Production Pipeline Integration: Works directly with editing suites, VFX tools, game engines, and platforms including Fal, Replicate, RunDiffusion, and ComfyUI via API access.

Efficient Compute: Runs on consumer-grade NVIDIA GPUs with up to 50% lower costs than competing models, with multi-GPU inference delivering faster-than-playback generation speeds.

Open-Source: Built on a hybrid diffusion-transformer (DiT) architecture. Model weights, training code, ComfyUI custom nodes, and reference workflows are all publicly available.

LTX Desktop Integration: Powers LTX Desktop, a free open-source non-linear video editor that runs LTX-2.3 entirely on local hardware.

LTX-2 vs. Other Models

Use Cases

Marketing and Product Demos: Generate promotional videos, product visualizations, or branded content with synchronized voiceovers and background music. The Fast mode lets teams iterate on concepts during client calls, then switch to Ultra mode for final delivery.

Game Development and Cinematics: Transform concept art or character poses into dynamic cutscenes without building full 3D pipelines. Use keyframe conditioning to control pacing and framing, then apply LoRA fine-tuning to maintain visual consistency across scenes.

VFX and Post-Production: Automate motion tracking, rotoscoping, and plate replacement while preserving cinematic quality. The model delivers broadcast-ready composites faster than real-time and integrates with existing VFX stacks.

Pre-Visualization and Storyboarding: Simulate camera movements, lighting setups, and scene pacing before production begins. Directors can visualize storyboards with realistic motion previews and refine compositions with clients before stepping on set.

Content Restoration and Upscaling: Enhance archival footage or rough renders up to native 4K while protecting the original creative intent. The model handles interpolation and style-preserving restoration for film remastering and animation cleanup.

Social Media and Short-Form Content: Create engaging clips for Instagram, TikTok, or YouTube Shorts with synchronized audio tracks. The Fast mode generates multiple variations quickly for A/B testing different concepts.

How to Use It

LTX-2.3 is available through three main access points: the LTX API, integration platforms like Fal and Replicate, and local deployment via the open-source model weights.

Table Of Contents

Using the LTX-2.3 API
Using Platform Integrations
Local Deployment
LTX Desktop (Local Video Editor)
Key Model Downloads Reference

Using the LTX-2.3 API

Request API access through the official LTX website at ltx.io. The API supports the same three performance modes as local inference.

Fast mode generates quick previews for mobile workflows and high-throughput ideation. Pro mode balances strong fidelity with fast turnaround for daily production and marketing teams. Ultra mode delivers maximum fidelity up to 4K at 50 fps with synchronized audio, suited for professional production and VFX.

After getting access, test generation through the LTX Playground before integrating into production workflows.

Using Platform Integrations

LTX-2.3 is available through Fal, Replicate, RunDiffusion, and ComfyUI. These integrations offer a user-friendly interface for video generation without managing the underlying infrastructure. ComfyUI users can access official custom nodes and reference workflows directly from the LTX-Video GitHub repository.

Local Deployment

Model weights are freely available on HuggingFace. The download wizard in LTX Desktop (Windows) selects which models to install. For direct deployment, the repository requires Python 3.10.5+ and CUDA 12.2.

The open-source release includes:

Base dev checkpoint (~20GB)
Quantized fp8 variant (lower VRAM requirement)
Distilled model for faster inference
ComfyUI custom nodes
Reference workflows
Training code

An H100 GPU generates five seconds of video in approximately two seconds. Consumer-grade NVIDIA cards such as the RTX 4090 or 5090 handle production workloads, with the quantized fp8 variant reducing VRAM requirements for lower-spec setups.

LTX Desktop (Local Video Editor)

LTX Desktop is a free, open-source non-linear editor that runs LTX-2.3 fully on-device. It ships with text-to-video, image-to-video, audio-to-video, retake, and context-aware gap fill built into the editing timeline.

Windows users with qualifying NVIDIA hardware pay nothing per render after setup. macOS users route generation through the LTX API with an API key.

Learn More about LTX Desktop

Key Model Downloads Reference

Model	Purpose	Required	Size
`checkpoint`	LTX-2.3 main weights	Yes	~20GB
`distilled_lora`	Fast mode (8-step inference)	For Fast mode	~500MB
`upsampler`	2× upscaling to 1080p	For 1080p output	~2GB
`text_encoder`	Local T5 text encoding	Optional	~5GB
`Z-Image Turbo`	Still image generation	For image gen	~30GB

Pros

LTX-2.3 creates dialogue, ambient sound, and music that sync with on-screen motion in a single generation pass.
The rebuilt VAE and larger text connector improve texture fidelity, prompt accuracy, and image-to-video consistency over LTX-2.
The model generates vertical video at 1080×1920 from portrait-trained data.
A six-second Full HD clip renders in five seconds on consumer hardware.
At $0.04 to $0.16 per second depending on resolution and mode, the LTX API runs cheaper than proprietary alternatives.
Full access to model weights, training code, ComfyUI nodes, and reference workflows.
Ultra mode delivers actual 4K at 50 fps, not upscaled HD.

Cons

High-end consumer GPU means RTX 4090 or 5090 class hardware. Lower-spec machines struggle with higher resolutions and longer sequences.
LoRAs trained for LTX-2 do not transfer to LTX-2.3.
LTX Desktop and local inference only support NVIDIA GPUs at this time.
Ten to fifteen seconds is short for many production scenarios.

Related Resources

LTX-2.3 Model on HuggingFace: Download the base dev checkpoint, quantized fp8 variant, and distilled model weights.
LTX-Video GitHub Repository: Training code, ComfyUI custom nodes, reference workflows, and inference tooling for LTX-2.3.
LTX Desktop GitHub Repository: Source code, releases, and issue tracker for the free local AI video editor.
LTX Console: Generate a free LTX API key for text encoding or video generation via the API.
Lightricks Discord Community: Active community for sharing workflows, troubleshooting issues, and connecting with other developers.

FAQs

Q: Can I use LTX-2.3 for commercial projects?
A: Yes. The model is available for commercial use. Outputs are generated from a model trained on licensed data from Getty Images and Shutterstock, which reduces copyright risk for commercial deployment in marketing, advertising, and client work. Companies with more than $10M in annual revenue need a commercial license for the model weights. The LTX Desktop application ships separately under Apache 2.0, which permits commercial use without a revenue threshold.

Q: What hardware do I need to run LTX-2.3 locally?
A: LTX-2.3 requires a CUDA-compatible NVIDIA GPU. High-end consumer cards like the RTX 4090 or 5090 handle the standard workflow. The quantized fp8 variant lowers VRAM requirements for lower-spec setups. For maximum throughput, an H100 GPU generates five seconds of video in approximately two seconds. For 1080p output via the upsampler model, you need at least 12GB of VRAM. AMD and Intel GPUs are not currently supported.

Q: What is LTX Desktop and how does it relate to LTX-2.3?
A: LTX Desktop is a free, open-source non-linear video editor that runs LTX-2.3 entirely on local hardware. It ships as a working reference implementation of the LTX engine and includes text-to-video, image-to-video, audio-to-video, retake, context-aware gap fill, color correction, trim tools, subtitle editing, and XML timeline import and export for Premiere Pro, DaVinci Resolve, and Final Cut Pro. On Windows with a qualifying NVIDIA GPU, generation costs nothing per render after setup. Mac users run generation via the LTX API.

Q: Does LTX-2.3 support portrait video for social media?
A: Yes. LTX-2.3 generates native vertical video at up to 1080×1920, trained specifically on portrait-orientation data. This means the model produces proper portrait output rather than cropping or letterboxing landscape-generated content. The format is production-ready for TikTok, Instagram Reels, and YouTube Shorts.

Q: What is the difference between Fast, Pro, and Ultra modes?
A: Fast mode prioritizes speed for live iteration, mobile workflows, and high-throughput ideation. Pro mode balances fidelity with turnaround time and suits daily production and marketing work. Ultra mode delivers maximum fidelity at up to 4K and 50 fps with synchronized audio, suited for final-stage professional production and VFX. The distilled model (downloaded separately as distilled_lora) powers Fast mode and handles generation in as few as eight steps.

Changelog

Mar 09, 2026

Updated for LTX-2.3

Jan 07, 2026

LTX-2 is Now Available: the next generation of LTX with synchronized audio+video generation!

Oct 24, 2025

Updated for LTX 2

Dec 19, 2024

Released 0.1.2

LTX-2.3: Open AI Video Generator with Synced Audio & 4K Output

Features

Use Cases