LTX2.3-22B ReStyle IC-LoRA v0.1

Prompt: make it Disney 2D animation style. This style is characterized by clean, consistent line art and a vibrant, warm color palette. It features expressive, rounded character designs reminiscent of contemporary storybook illustrations or digital 2D feature films. The background utilizes soft textures and gentle lighting to create a friendly, polished, and approachable visual atmosphere.

Prompt: make it childlike MS Paint digital art style. This style is defined by a deliberately primitive and untrained aesthetic, made using basic digital tools. It features rough, unaliased pixel edges, simplified stick-figure anatomy, and flat, untextured color fills. The perspective is often rudimentary and the color palette basic, resembling art created by a very young child using foundational computer programs like Microsoft Paint.

This is an early v0.1 release with known limitations. It transfers simpler styles (e.g. flat 2D / cel-shaded / monochrome line art), but struggles with more complex styles that involve texture, intricate detail, or strong material/lighting effects.

Quality often improves noticeably if you:

raise CFG to ~1.1 – 2.0 (CFG=1 is the distilled-model default)

use a non-distilled LTX-2.3 model

An IC-LoRA (in-context LoRA) for LTX-Video 2.3 (22B) trained for image-guided style transfer: given a source video and a single reference image describing the target style, the model re-renders the video in that style while preserving the original content and motion.

Training details

This IC-LoRA was trained on RunPod cloud GPUs.

Base model: Lightricks/LTX-2.3 (22B)
Training framework: ltx-trainer (Lightricks)
Training strategy: video-to-video IC-LoRA (first_frame_conditioning_p: 0.0, reference latents stream carries style)
Released checkpoint: step 8,000
LoRA rank / alpha: 128 / 128
Target modules: attn1.{to_k,to_q,to_v,to_out.0} + attn2.{to_k,to_q,to_v,to_out.0} (self + cross attention)
Optimizer: Prodigy
Scheduler: constant
Mixed precision: bf16
Batch size: 1 (gradient checkpointing on)
Timestep sampling: shifted_logit_normal
Resolution: trained at 768x448 @ 97 frames
Dataset: 562 cross-pair samples derived from the Ditto-1M style-transfer dataset (50 styles × ~11 pairs each). Each training reference is constructed by replacing frame 0 of the source video with the stylized first frame of a different pair from the same style

Inference

For inference I used ComfyUI. Workflow available here: Cseti/ComfyUI-Workflows — restyle-ic-lora.

Conditioning — both modalities supported, mixing them works best:

Image reference: a single still image in the requested style, fed as frame 0
Text prompt: e.g. Make it Disney 2D Animation style. / Make it watercolor style. — matches the training caption template (Make it {style} style.).

Strength: 1.0.

Prompting tips

The style reference image carries the primary signal; the text prompt reinforces and disambiguates it. A few patterns that help:

Match the training caption template: Make it {style} style. — e.g. Make it watercolor style., Make it Disney 2D Animation style.. The shorter form is the safe default.
A more detailed style description can help: Expanding the prompt with technique / medium / palette / lighting cues helps the model toward your intent.

Important Notes

This LoRA is created as part of a research project. The training data is derived from the publicly released Ditto-1M dataset; please respect the licensing terms of the source dataset and any source video content. Users utilize the model at their own risk and are obligated to comply with applicable copyright laws.

Acknowledgement

Special thanks to:

Lightricks for open-sourcing the LTX-2 trainer and the LTX-2.3 22B model
The authors of Ditto-1M for releasing the style-transfer dataset that made this LoRA possible

Support

Training models like this requires renting cloud GPUs, which gets expensive quickly. If you find this LoRA useful and would like me to keep contributing open-source models, your support is very much appreciated: