How to train LoRA model?

How to train LoRA model?

After the advent of the AIGC era, LoRA (Low-Rank Adaptation of Large Language Models) has undoubtedly become the model most commonly used with the Stable Diffusion (SD) series in the field of AI painting. The combination of the SD model + LoRA model has generated one exquisite model after another. Pictures and videos create many imaginative styles, which greatly improve the quality and efficiency of AI painting.

In this article, we mainly make an in-depth analysis and summary of all aspects of the LoRA model in all dimensions, including basic principles of the LoRA model, LoRA model training tutorial from 0 to 1.

Table of Contents

  1. The basics of LoRA model
    1. The core principles of LoRA model
  2. The benefits of LoRA model
  3. Get started training the LoRA model from 0 to 1
    1. Use the Kohya-trainer framework to train the LoRA model
    2. Train LoRA model using diffusers framework

The basics of LoRA model

After the outbreak of AIGC, large models entered the industry's field of vision. The powerful capabilities of large models completely broke the AI ​​circle. It can be said that there is no one in the world who does not know Stable Diffusion and ChatGPT.

However, the parameters of large models are huge, causing the high training cost. When encountered some downstream subdivision tasks, full-parameter training of large models is not quite cost-effective. At the same time, the domains of these downstream subdivision tasks are relatively easy to constrain. In this case , the protagonist of this article - LoRA appears.

The core principles of LoRA model

The training logic of the LoRA model is to first freeze the weights of the SD model, then inject the LoRA module into the U-Net structure of the SD model, combine it with the CrossAttention module, and perform fine-tuning training on only this part of the parameters.

In other words, for the SD model weights, we no longer perform full-parameter fine-tuning training on them. We add residuals to the weights and complete the optimization process through training:

Where is the rank (Rank, lora_dim) of this parameter matrix, which is composed of the product of two low-rank matrices through low-rank decomposition. Since the domain of the downstream subdivision task is very small, it can be obtained very small, and many times we can obtain it. Therefore, after the training is completed, we can obtain a LoRA model with much smaller parameters than the SD model.

To give you an intuitive example, we assume that the original parameter matrix is ​​100*1024, then the number of parameters is 102400. The LoRA model splits the matrix into two matrices and multiplies them. If we set Rank=8, then it is 100* For matrix multiplication of 8 B matrix and 8*1024 A matrix, the parameter amount is 800+8192=8992, and the overall parameter amount is reduced by about 11.39 times.

The parameters of the matrix product AB and SD models have the same dimensions, and the two smaller matrices decomposed at the same time can ensure that the parameter update is in a low-rank situation, thus significantly reducing the number of parameters for training.

Generally speaking, for matrices, we use random Gaussian distribution initialization, and for matrices we use full initialization, so that the result of multiplying these two matrices in the initial state is . This ensures that only the SD model (main model) takes effect in the initial stage.

The LoRA model greatly reduces the memory usage during SD model training. Because the main model (SD model) is not optimized, the optimizer parameters corresponding to the main model do not need to be stored. However, the amount of calculation has not changed significantly, because LoRA adds a "residual" gradient to the full parameter gradient of the main model, while saving the process of updating the weights of the main model optimizer.

When we combine the LoRA model with the SD main model, we can directly calculate and store the parameters derived from the LoRA model and update the weights of the SD main model. At this time, the structure of the SD main model does not change, but the weights are optimized and updated.

The benefits of LoRA model

  • Faster: Compared with full-parameter training of SD series models, LoRA model training is faster.
  • Low requirement on computing power: In addition, LoRA has very low computing power requirements, it can be trained on 2080Ti level computing equipment.
  • Low parameters requirement: Since it is only trained in combination with the SD model, the number of parameters of the LoRA model itself is very small, with a minimum of about 3M.
  • Great compatibility: The LoRA model can be trained on small data sets (more than 1 image is enough), and is well compatible and transferable with different SD models.
  • Better learning ability: The main model parameters remain unchanged during training, and the LoRA model can better optimize learning based on the capabilities of the main model.
  • High efficiency: Through efficient parameterization of weight updates, switching loading between LoRA models is efficient and easy.


Get started training the LoRA model from 0 to 1

Below we are going to mainly explain two training frameworks to train the LoRA model.

Use the Kohya-trainer framework to train the LoRA model

Suppose you want to add your own image to the model, then you need to collect data first, use your own photos as training data, train a new model and save it.

  1. Data preparation

First, you need to collect the pictures, which can be downloaded through crawlers or directly from search engines. Here we choose to download directly from Google image search. Note that the quality of data preparation determines the effect of your final model. If the image you feed to the model is a low-quality image, the image generated by the model will also be a low-quality image, so try to ensure that the image is clear and has a high resolution.

  1. Train the model

It is recommended to use the project kohya_ss with GUI, which is suitable for readers without programming experience. After completing the installation according to the installation instructions, open the GUI to the following interface and select the Dreambooth LoRA interface.

  1. Label the image

Enter the location of the image, select Basic Caption, and add your keywords in the Prefix item. For example, our keyword here is named caixukun. You can add a more detailed description to the image, such as an asian man, star. Note that this keyword and description are very important. Enter this keyword when generating the image to generate the effect we want.

Rename the file and add the prefix 200_, which means that the image will be repeated 200 times. The total number of steps in the entire training process is 200 * number of images.

To start training, switch to the following page, enter the image path data/lora_train, name the model, and click to start training. Note that the folder is a subfolder of lora_train. Don’t write the wrong path here.

Move the model generated by training to the stable-diffusion-webui/models/Lora folder, enter the following prompt, and test the effect.

Train LoRA model using diffusers framework

Another way to train LoRA model is directly use the code in the diffusers library to train the LoRA model.

Download the diffusers library code and install the dependencies required to run the diffusers library through the following commands:

git clone https://github.com/huggingface/diffusers.git

pip install --upgrade diffusers[torch] -i https://pypi.tuna.tsinghua.edu.cn/simple some-package

After downloading the diffusers library code and installing the required dependencies, we can start using the diffusers library code to train the SDXL LoRA model and SD LoRA model.