Introduction

We introduce LongCat-Image-Edit, the image editing version of Longcat-Image. LongCat-Image-Edit supports bilingual (Chinese-English) editing, achieves state-of-the-art performance among open-source image editing models, delivering leading instruction-following and image quality with superior visual consistency.

Key Features

🌟 Superior Precise Editing: LongCat-Image-Edit supports various editing tasks, such as global editing, local editing, text modification, and reference-guided editing. It has strong semantic understanding capabilities and can perform precise editing according to instructions.
🌟 Consistency Preservation: LongCat-Image-Edit has strong consistency preservation capabilities, specifically scrutinizes whether attributes in non-edited regions, such as layout, texture, color tone, and subject identity, remain invariant unless targeted by the instruction, is well demonstrated in multi-turn editing.
🌟 Strong Benchmark Performance: LongCat-Image-Edit achieves state-of-the-art (SOTA) performance in image editing tasks while significantly improving model inference efficiency, especially among open-source image editing models.

🎨 Showcase

Quick Start

Hugging Face app

Installation

Clone the repo:

git clone --single-branch --branch main https://github.com/meituan-longcat/LongCat-Image
cd LongCat-Image

Install dependencies:

# create conda environment
conda create -n longcat-image python=3.10
conda activate longcat-image

# install other requirements
pip install -r requirements.txt
python setup.py develop

Run Image Editing

📝 Special Handling for Text Rendering

For both Text-to-Image and Image Editing tasks involving text generation, you must enclose the target text within single or double quotation marks (both English '...' / "..." and Chinese ‘...’ / “...” styles are supported).

Reasoning: The model utilizes a specialized character-level encoding strategy specifically for quoted content. Failure to use explicit quotation marks prevents this mechanism from triggering, which will severely compromise the text rendering capability.

import torch
from PIL import Image
from transformers import AutoProcessor
from longcat_image.models import LongCatImageTransformer2DModel
from longcat_image.pipelines import LongCatImageEditPipeline

device = torch.device('cuda')
checkpoint_dir = './weights/LongCat-Image-Edit'
text_processor = AutoProcessor.from_pretrained( checkpoint_dir, subfolder = 'tokenizer'  )
transformer = LongCatImageTransformer2DModel.from_pretrained( checkpoint_dir , subfolder = 'transformer', 
    torch_dtype=torch.bfloat16, use_safetensors=True).to(device)

pipe = LongCatImageEditPipeline.from_pretrained(
    checkpoint_dir,
    transformer=transformer,
    text_processor=text_processor,
)
# pipe.to(device, torch.bfloat16)  # Uncomment for high VRAM devices (Faster inference)
pipe.enable_model_cpu_offload()  # Offload to CPU to save VRAM (Required ~19 GB); slower but prevents OOM

generator = torch.Generator("cpu").manual_seed(43)
img = Image.open('assets/test.png')
prompt = '将猫变成狗'
image = pipe(
    img,
    prompt,
    negative_prompt='',
    guidance_scale=4.5,
    num_inference_steps=50,
    num_images_per_prompt=1,
    generator=generator
).images[0]

image.save('./edit_example.png')

Downloads last month: 1,600

Model tree for meituan-longcat/LongCat-Image-Edit

Finetunes

1 model

Quantizations