LoRA Fine-Tuning in Python — Core Concepts

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that modifies the behavior of a pretrained model by injecting small trainable matrices into its layers, while keeping the original weights frozen. Instead of updating millions or billions of parameters, LoRA trains a fraction — typically 0.1% to 1% of the original model size.

The core idea

In a standard neural network layer, the weight matrix W might be 4096 × 4096 — roughly 16 million parameters. Full fine-tuning adjusts all of them. LoRA instead adds two small matrices: A (4096 × rank) and B (rank × 4096), where rank is a small number like 4, 8, or 16.

During inference, the output becomes: output = W·x + B·A·x

The product B·A has the same dimensions as W, but because rank is small, A and B together contain far fewer parameters. With rank 8, you go from 16 million trainable parameters to about 65 thousand — a 250x reduction.

Why it works

Large language models and image generators are vastly over-parameterized. Research shows that the weight changes needed for task-specific adaptation tend to live in a low-rank subspace. LoRA exploits this by constraining updates to a low-rank decomposition, which acts as a regularizer and often produces better results than full fine-tuning on small datasets.

LoRA for image generation

Training a LoRA for Stable Diffusion typically targets the cross-attention layers of the U-Net — the layers where text conditioning meets image features:

from diffusers import StableDiffusionPipeline
import torch

# Load base model
pipe = StableDiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16,
).to("cuda")

# Load a trained LoRA
pipe.load_lora_weights("path/to/lora/weights")

# Generate with LoRA influence
image = pipe(
    "a portrait in the style of <trained_concept>",
    num_inference_steps=30,
).images[0]

# Remove LoRA to restore base model behavior
pipe.unload_lora_weights()

LoRA for language models with PEFT

The peft library from Hugging Face provides a clean interface for applying LoRA to any transformer:

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf")

lora_config = LoraConfig(
    r=16,                      # rank
    lora_alpha=32,             # scaling factor
    target_modules=["q_proj", "v_proj"],  # which layers to adapt
    lora_dropout=0.05,
    bias="none",
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters()
# Output: trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.062%

Key parameters

Rank (r): Controls the capacity of the adaptation. Rank 4 works for simple style changes; rank 16–64 for complex behavioral shifts. Higher rank means more parameters and longer training.

Alpha: A scaling factor applied to the LoRA output. The effective scaling is alpha / rank. Common practice sets alpha to twice the rank value.

Target modules: Which layers get LoRA adapters. For attention-based models, query and value projections are standard targets. Adding key projections and feed-forward layers increases capacity but also training time.

Combining multiple LoRAs

LoRAs can be stacked with different weights:

pipe.load_lora_weights("style_lora", adapter_name="style")
pipe.load_lora_weights("lighting_lora", adapter_name="lighting")
pipe.set_adapters(["style", "lighting"], adapter_weights=[0.8, 0.5])

This lets you mix a style LoRA at 80% strength with a lighting LoRA at 50% strength, composing effects that neither achieves alone.

Common misconception

LoRA does not compress or simplify the base model. The original model runs exactly as before — LoRA adds a small parallel path. At inference time, the LoRA matrices can be merged into the base weights for zero overhead, but the model itself does not become smaller.

When to use LoRA vs. full fine-tuning

ScenarioLoRAFull fine-tuning
Small dataset (< 10k examples)PreferredRisk of overfitting
Consumer GPU (8–16 GB)Fits easilyOften impossible
Multiple specialized modelsStore small adapter filesStore full model copies
Maximum quality, unlimited budgetGood but not always bestCan squeeze more performance
Quick iterationMinutes to hoursHours to days

One thing to remember: LoRA decomposes weight updates into two small matrices that capture task-specific changes in a fraction of the space, making fine-tuning accessible on consumer hardware while keeping the original model intact.

pythonlorafine-tuningmachine-learning

See Also

  • Diffusion Models Stable Diffusion and DALL-E don't 'draw' your images — they unspoil a scrambled mess until a picture emerges. Here's the surprisingly simple idea behind it.
  • Python Controlnet Image Control Find out how ControlNet lets you boss around an AI artist by giving it sketches, poses, and outlines to follow.
  • Python Gan Training Patterns Learn how two neural networks compete like an art forger and a detective to create incredibly realistic fake images.
  • Python Image Generation Pipelines Discover how Python chains together multiple steps to turn your ideas into polished AI-generated images, like a factory assembly line for pictures.
  • Python Image Inpainting Learn how Python can magically fill in missing parts of a photo, like erasing something and having the picture fix itself.