Image Generation

🖼️

Stable Diffusion

Generate high-quality images from text prompts using self-hosted Stable Diffusion models

Intermediate open-source self-hosted AI Art text-to-image deep learning

GitHub Repository Official Website

Alternative To

• Midjourney
• DALL-E
• Adobe Firefly

Difficulty Level

Intermediate

Requires some technical experience. Moderate setup complexity.

Overview

Stable Diffusion is a state-of-the-art text-to-image latent diffusion model that generates detailed images based on text descriptions. Developed by Stability AI in collaboration with CompVis LMU and Runway, this open-source model has revolutionized AI image generation by making powerful image synthesis technology publicly accessible. The latest version, Stable Diffusion 3.5 (released in 2024), offers significant improvements in image quality, prompt adherence, and performance over previous versions.

System Requirements

CPU: 4+ cores (modern multi-core processor recommended)
RAM: 16GB minimum (32GB+ recommended, especially for newer versions)
GPU:
- NVIDIA GPU with 8GB+ VRAM for SD 1.x and 2.x models
- NVIDIA GPU with 12GB+ VRAM for SDXL models
- NVIDIA GPU with 24GB+ VRAM recommended for SD 3.x Large models
- SD 3.5 Medium requires approximately 10GB VRAM
Storage: 20GB+ for base models and dependencies (100GB+ recommended for multiple models)
OS: Windows, Linux, or macOS with proper GPU drivers

Installation Guide

There are multiple ways to run Stable Diffusion locally. Here are the most common approaches:

Option 1: Using Stable Diffusion WebUI (AUTOMATIC1111)

This is the most popular and user-friendly way to run Stable Diffusion locally.

Clone the repository:

git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git

Navigate to the directory and run the appropriate script:

For Windows:

cd stable-diffusion-webui
webui-user.bat

For Linux/macOS:

cd stable-diffusion-webui
bash webui.sh

The script will automatically install required dependencies and download a default model.
Access the WebUI at http://localhost:7860 in your browser.

Option 2: Using ComfyUI

ComfyUI offers a more customizable, node-based approach to running Stable Diffusion.

Clone the repository:

git clone https://github.com/comfyanonymous/ComfyUI.git

Navigate to the directory:
```
cd ComfyUI
```
Install dependencies:
```
pip install -r requirements.txt
```
Run ComfyUI:
```
python main.py
```
Access the interface at http://localhost:8188 in your browser.

Option 3: Direct API Usage with Diffusers Library

For developers who prefer programmatic access:

Install the Diffusers library:

pip install diffusers transformers accelerate safetensors

Create a Python script to generate images:

import torch
from diffusers import StableDiffusionPipeline

model_id = "runwayml/stable-diffusion-v1-5"  # Choose your preferred model

pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")

prompt = "A serene landscape with mountains and a lake at sunset, highly detailed"

image = pipe(prompt).images[0]
image.save("generated_image.png")

Practical Exercise: Creating Images with Stable Diffusion

Step 1: Basic Text-to-Image Generation

Once you have Stable Diffusion installed, try generating your first image:

If using Stable Diffusion WebUI, navigate to the “txt2img” tab.

Enter a detailed prompt in the “Prompt” field, for example:

A beautiful fantasy landscape with floating islands, waterfalls, and ancient ruins, lush vegetation, magical atmosphere, golden hour lighting, highly detailed, 8k resolution

For better results, add a negative prompt:

blurry, low quality, distorted, deformed, ugly, bad anatomy

Set initial generation parameters:
- Sampling method: DPM++ 2M Karras
- Sampling steps: 30
- Width/Height: 512x512 (or 1024x1024 for SDXL/SD3)
- CFG Scale: 7.5
- Seed: -1 (random)
Click “Generate” to create your image.

Step 2: Refining Your Results

Experiment with different settings to improve your results:

Adjust CFG Scale:
- Lower values (5-7): More creative but potentially less aligned with prompt
- Higher values (8-12): More literal interpretation of prompt
Try Different Samplers:
- DPM++ 2M Karras: Good overall quality
- Euler a: Fast with good results
- DDIM: Good for consistency with seeds
Adjust Sampling Steps:
- 20-30 steps is usually sufficient
- More steps (40+) for more detailed images but diminishing returns

Step 3: Advanced Techniques

Once you’re comfortable with basic generation, try these advanced features:

Image-to-Image: Upload an existing image and transform it based on your prompt
Inpainting: Selectively edit parts of an image by masking areas
Using ControlNet: Guide generation with edges, poses, or depth maps

Prompt Engineering: Experiment with different prompt structures:

(subject:1.2), setting, lighting, style, (details:1.1), quality modifiers

Resources

Official Resources

Community Tools

Learning Resources

Model Versions

Stable Diffusion 1.x: The original models, good for general use
Stable Diffusion 2.x: Improved with better text understanding
Stable Diffusion XL: Higher resolution with better quality
Stable Diffusion 3.x/3.5: Latest architecture with superior quality and prompt adherence