WARNING: THIS SITE IS A MIRROR OF GITHUB.COM / IT CANNOT LOGIN OR REGISTER ACCOUNTS / THE CONTENTS ARE PROVIDED AS-IS / THIS SITE ASSUMES NO RESPONSIBILITY FOR ANY DISPLAYED CONTENT OR LINKS / IF YOU FOUND SOMETHING MAY NOT GOOD FOR EVERYONE, CONTACT ADMIN AT ilovescratch@foxmail.com
Skip to content

Conversation

@Deng-Xian-Sheng
Copy link

支持int8-quanto量化微调Qwen-Image-Edit-2509,48G显存即可微调🎉,训练脚本添加result_image_field_name参数.

这是一个启动的例子:

accelerate launch train.py \
  --dataset_base_path "/root/autodl-tmp/dataset" \
  --dataset_metadata_path "/root/autodl-tmp/metadata_qwen_imgae_edit_multi_one.json" \
  --result_image_field_name "result_image" \
  --data_file_keys "result_image,reference_img" \
  --extra_inputs "reference_img" \
  --max_pixels 1048576 \
  --dataset_repeat 50 \
  --model_paths '[
    [
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00001-of-00005.safetensors",
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00002-of-00005.safetensors",
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00003-of-00005.safetensors",
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00004-of-00005.safetensors",
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00005-of-00005.safetensors"
    ],
    [
      "Qwen-Image-Edit-2509/text_encoder/model-00001-of-00004.safetensors",
      "Qwen-Image-Edit-2509/text_encoder/model-00002-of-00004.safetensors",
      "Qwen-Image-Edit-2509/text_encoder/model-00003-of-00004.safetensors",
      "Qwen-Image-Edit-2509/text_encoder/model-00004-of-00004.safetensors"
    ],
    "Qwen-Image-Edit-2509/vae/diffusion_pytorch_model.safetensors"
  ]' \
  --tokenizer_path "Qwen-Image-Edit-2509/tokenizer" \
  --processor_path "Qwen-Image-Edit-2509/processor" \
  --learning_rate 1e-4 \
  --num_epochs 5 \
  --remove_prefix_in_ckpt "pipe.dit." \
  --output_path "./checkpoint_one" \
  --lora_base_model "dit" \
  --lora_target_modules "to_q,to_k,to_v,add_q_proj,add_k_proj,add_v_proj,to_out.0,to_add_out,img_mlp.net.2,img_mod.1,txt_mlp.net.2,txt_mod.1" \
  --lora_rank 32 \
  --use_gradient_checkpointing \
  --dataset_num_workers 8 \
  --find_unused_parameters \
  --base_model_precision int8-quanto \
  --text_encoder_1_precision no_change

新的参数有:

  --result_image_field_name "result_image" \
  --base_model_precision int8-quanto \
  --text_encoder_1_precision no_change \
  --quantize_activations \
  --quantize_vae

--result_image_field_name用于指定并代替之前的image字段名,因为它在微调Qwen-Image-Edit模型时语义不是很清晰。

你仍可以使用image作为模型生成的结果图像的字段名,并将:

--data_file_keys "result_image,reference_img" \
--extra_inputs "reference_img" \

改为:

--data_file_keys "image,edit_image" \
--extra_inputs "edit_image" \

这和作者的例子是一致的,但是我建议,劳烦你写个脚本,修改你的json文件中的字段名,因为你可以看到之前的字段名有一定的歧义,这不会影响使用,完全取决于你认为的可读性。

--base_model_precision 量化DiT,可能是最占显存的部分。

--text_encoder_1_precision 量化文本编码器,Qwen2.5-VL部分

--quantize_activations 使用 quanto 时,除了量化权重之外,还要量化激活值。

--quantize_vae 量化VAE,可能会在非常细腻的纹理上有轻微劣化(通常可接受)


我在 4090 48G上测试,使用以下配置:

accelerate launch train.py \
  --dataset_base_path "/root/autodl-tmp/dataset" \
  --dataset_metadata_path "/root/autodl-tmp/metadata_qwen_imgae_edit_multi_one.json" \
  --result_image_field_name "result_image" \
  --data_file_keys "result_image,reference_img" \
  --extra_inputs "reference_img" \
  --max_pixels 1048576 \
  --dataset_repeat 1 \
  --model_paths '[
    [
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00001-of-00005.safetensors",
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00002-of-00005.safetensors",
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00003-of-00005.safetensors",
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00004-of-00005.safetensors",
      "Qwen-Image-Edit-2509/transformer/diffusion_pytorch_model-00005-of-00005.safetensors"
    ],
    [
      "Qwen-Image-Edit-2509/text_encoder/model-00001-of-00004.safetensors",
      "Qwen-Image-Edit-2509/text_encoder/model-00002-of-00004.safetensors",
      "Qwen-Image-Edit-2509/text_encoder/model-00003-of-00004.safetensors",
      "Qwen-Image-Edit-2509/text_encoder/model-00004-of-00004.safetensors"
    ],
    "Qwen-Image-Edit-2509/vae/diffusion_pytorch_model.safetensors"
  ]' \
  --tokenizer_path "Qwen-Image-Edit-2509/tokenizer" \
  --processor_path "Qwen-Image-Edit-2509/processor" \
  --learning_rate 1e-4 \
  --num_epochs 1 \
  --remove_prefix_in_ckpt "pipe.dit." \
  --output_path "./checkpoint_one" \
  --lora_base_model "dit" \
  --lora_target_modules "to_q,to_k,to_v,add_q_proj,add_k_proj,add_v_proj,to_out.0,to_add_out,img_mlp.net.2,img_mod.1,txt_mlp.net.2,txt_mod.1" \
  --lora_rank 32 \
  --use_gradient_checkpointing \
  --dataset_num_workers 8 \
  --find_unused_parameters \
  --base_model_precision int8-quanto \
  --text_encoder_1_precision no_change

由于是测试,我将--num_epochs、--dataset_repeat设为了1。

成功得到./checkpoint_one/epoch-0.safetensors并进行了推理,没有问题。(注意在48G显存下推理需要采取量化推理或者cpu offload + pipe.vae.enable_slicing()措施。)

我还测试了以下极端的组合:

--base_model_precision int8-quanto \
--text_encoder_1_precision int8-quanto \
--quantize_activations \
--quantize_vae

跑了大约30步,没啥问题。

int8-quanto量化微调的代码我受到SimpleTuner项目启发,有兴趣大家去看看https://github.com/bghira/SimpleTuner

我个人认为一个较好的配置是:

--base_model_precision int8-quanto \
--text_encoder_1_precision int8-quanto

DiT似乎挺大,文本编码器Qwen2.5-VL部分也很大,光它就占16G显存,而量化VAE和量化激活值我觉得似乎必要性不大。

量化不仅仅是跑起来,还是在分辨率和rank之间找平衡,更大的分辨率和rank需要更多显存,所以,完全取决于你剩多少显存,你想用什么样的分辨率和rank,当你想用更大的分辨率时,就要从别处挤压显存出来。


最后,才疏学浅,它能跑了,具体实现有没有问题还得看各位前辈分析,谢谢🙏。

可以摆脱这个邪恶的报错了🎉🎉🍺🍺:

torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 130.00 MiB. GPU 0 has a total capacity of 47.37 GiB of which 115.88 MiB is free. Including non-PyTorch memory, this process has 47.25 GiB memory in use. Of the allocated memory 46.75 GiB is allocated by PyTorch, and 130.40 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

如果你想在24G上跑,可以尝试:

"int2-quanto",
"int4-quanto",

我没试过,但它应该是起作用的。

显然,量化是一种哲学,包括Z-Image模型在内,任何想通过少steps实现图像生成的,都是一种节省方法,但这并不是没有代价,图像细节会差点。还有借助强化学习来优化图像生成的,没有被强化学习训练到的部分,效果会变差。

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Deng-Xian-Sheng, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

此拉取请求的核心目标是优化Qwen-Image-Edit模型的微调过程,使其对硬件资源的需求更低。通过引入int8-quanto量化技术,并提供灵活的配置选项,用户现在可以在48GB显存的GPU上进行模型微调,从而极大地扩展了可用的硬件范围。此外,还对训练脚本进行了改进,增加了对数据集字段命名的支持,提升了用户体验和灵活性。

Highlights

  • 支持int8-quanto量化微调: 此PR为Qwen-Image-Edit模型引入了int8-quanto量化微调支持,显著降低了显存需求,使得48G显存即可进行微调。
  • 新增训练脚本参数: 训练脚本train.py新增了--result_image_field_name--base_model_precision--text_encoder_1_precision--quantize_activations--quantize_vae等参数,提供了更细致的量化控制和数据集字段配置。
  • 优化显存使用: 通过对DiT模型、文本编码器和可选的VAE进行量化,有效解决了CUDA显存不足的问题,提高了训练的可行性。
  • Quanto与PEFT兼容性: 新增了Quanto与PEFT的兼容性工作区,确保在量化模型上使用LoRA微调的正确性。
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

本次PR引入了对Qwen-Image-Edit模型的int8-quanto量化微调支持,这是一个非常实用的功能,可以显著降低显存占用,让更多用户能够在消费级硬件上进行微调。代码实现上,你将量化逻辑很好地模块化到了diffsynth/utils/quantisation包中,并且通过monkey-patch解决了peftquanto的兼容性问题,思路清晰。同时,添加result_image_field_name参数也提高了脚本的灵活性和可读性。

整体代码质量不错,我主要针对代码的可维护性和健壮性提出了一些建议,例如简化冗余代码、降低模块间的耦合度等,希望能帮助你进一步完善代码。感谢你的贡献!

Comment on lines +103 to +160
models = [
(
model,
{
"quant_fn": get_quant_fn(args.base_model_precision),
"model_precision": args.base_model_precision,
"quantize_activations": getattr(args, "quantize_activations", False),
},
),
(
controlnet,
{
"quant_fn": get_quant_fn(args.base_model_precision),
"model_precision": args.base_model_precision,
"quantize_activations": getattr(args, "quantize_activations", False),
},
),
(
te1,
{
"quant_fn": get_quant_fn(args.text_encoder_1_precision),
"model_precision": args.text_encoder_1_precision,
"base_model_precision": args.base_model_precision,
},
),
(
te2,
{
"quant_fn": get_quant_fn(args.text_encoder_2_precision),
"model_precision": args.text_encoder_2_precision,
"base_model_precision": args.base_model_precision,
},
),
(
te3,
{
"quant_fn": get_quant_fn(args.text_encoder_3_precision),
"model_precision": args.text_encoder_3_precision,
"base_model_precision": args.base_model_precision,
},
),
(
te4,
{
"quant_fn": get_quant_fn(args.text_encoder_4_precision),
"model_precision": args.text_encoder_4_precision,
"base_model_precision": args.base_model_precision,
},
),
(
ema,
{
"quant_fn": get_quant_fn(args.base_model_precision),
"model_precision": args.base_model_precision,
"quantize_activations": getattr(args, "quantize_activations", False),
},
),
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

models 列表的构建方式包含大量重复代码,这使得代码难以阅读和维护。此外,它依赖于调用者(train.py)提供一个包含所有 text_encoder_*_precision 属性的 args 对象,导致了不必要的强耦合(见 train.pyfake_args 的实现)。

建议将此部分重构为一个配置列表和循环,并使用 getattr 的默认值来增强函数的健壮性,从而解耦此工具函数与具体的训练脚本。

    model_definitions = [
        {"model": model, "precision_key": "base_model_precision", "quantize_activations": True},
        {"model": controlnet, "precision_key": "base_model_precision", "quantize_activations": True},
        {"model": te1, "precision_key": "text_encoder_1_precision"},
        {"model": te2, "precision_key": "text_encoder_2_precision"},
        {"model": te3, "precision_key": "text_encoder_3_precision"},
        {"model": te4, "precision_key": "text_encoder_4_precision"},
        {"model": ema, "precision_key": "base_model_precision", "quantize_activations": True},
    ]

    models = []
    for definition in model_definitions:
        m = definition["model"]
        precision_key = definition["precision_key"]
        # Use getattr with a default to make this function more robust
        model_precision = getattr(args, precision_key, "no_change")
        
        qargs = {
            "quant_fn": get_quant_fn(model_precision),
            "model_precision": model_precision,
            "base_model_precision": args.base_model_precision,
        }
        if definition.get("quantize_activations"):
            qargs["quantize_activations"] = getattr(args, "quantize_activations", False)
        
        models.append((m, qargs))

Comment on lines +64 to +71
fake_args = SimpleNamespace(
base_model_precision=base_model_precision,
text_encoder_1_precision=text_encoder_1_precision,
text_encoder_2_precision="no_change",
text_encoder_3_precision="no_change",
text_encoder_4_precision="no_change",
quantize_activations=quantize_activations,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

此处创建 fake_args 来补全 quantise_model 函数所需的参数,这是一种强耦合的实现方式。如果 quantise_model 的接口发生变化(例如需要 text_encoder_5_precision),此处的代码也需要同步修改,容易出错。更健壮的做法是让 quantise_model 函数自身能处理缺失的参数,例如使用 getattr(args, "text_encoder_2_precision", "no_change") 来提供默认值。这可以解耦训练脚本和量化工具函数。

Comment on lines +92 to +101
te1 = te2 = te3 = te4 = None
if text_encoders is not None:
if len(text_encoders) > 0:
te1 = text_encoders[0]
if len(text_encoders) > 1:
te2 = text_encoders[1]
if len(text_encoders) > 2:
te3 = text_encoders[2]
if len(text_encoders) > 3:
te4 = text_encoders[3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这部分展开 text_encoders 的代码有些冗余,并且硬编码了最多4个编码器。可以将其简化,使其更具可读性和扩展性。

    te = [None] * 4
    if text_encoders is not None:
        num_tes = min(len(text_encoders), 4)
        te[:num_tes] = text_encoders[:num_tes]
    te1, te2, te3, te4 = te

Comment on lines +180 to +190
new_text_encoders = []
if te1 is not None:
new_text_encoders.append(te1)
if te2 is not None:
new_text_encoders.append(te2)
if te3 is not None:
new_text_encoders.append(te3)
if te4 is not None:
new_text_encoders.append(te4)
if len(new_text_encoders) == 0:
new_text_encoders = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

这部分重新打包 text_encoders 的代码也有些冗余。可以使用列表推导式来简化,使其更简洁。

    new_text_encoders = [te for te in [te1, te2, te3, te4] if te is not None]
    if not new_text_encoders:
        new_text_encoders = None

Comment on lines +78 to +81
if self.disable_adapters:
if self.merged:
self.unmerge()
result = self.base_layer(x, *args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

forward 方法中存在重复的 if self.disable_adapters: 判断。第一个判断(第75行)已经包含了 return 语句,因此第二个 if 块(第78-81行)是无法到达的死代码,应当移除。

Comment on lines +45 to +48
use_quanto = (
(base_model_precision is not None and "quanto" in base_model_precision.lower())
or (text_encoder_1_precision is not None and "quanto" in text_encoder_1_precision.lower())
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

use_quanto 的检查逻辑可以写得更简洁。使用 any() 和生成器表达式可以让代码更具可读性。

Suggested change
use_quanto = (
(base_model_precision is not None and "quanto" in base_model_precision.lower())
or (text_encoder_1_precision is not None and "quanto" in text_encoder_1_precision.lower())
)
use_quanto = any(
p is not None and "quanto" in p.lower()
for p in [base_model_precision, text_encoder_1_precision]
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant