Image Generation
🖼️

Stable Diffusion

Generate high-quality images from text prompts using self-hosted Stable Diffusion models

Intermediate open-source self-hosted AI Art text-to-image deep learning

Alternative To

  • • Midjourney
  • • DALL-E
  • • Adobe Firefly

Difficulty Level

Intermediate

Requires some technical experience. Moderate setup complexity.

Overview

Stable Diffusion is a state-of-the-art text-to-image latent diffusion model that generates detailed images based on text descriptions. Developed by Stability AI in collaboration with CompVis LMU and Runway, this open-source model has revolutionized AI image generation by making powerful image synthesis technology publicly accessible. The latest version, Stable Diffusion 3.5 (released in 2024), offers significant improvements in image quality, prompt adherence, and performance over previous versions.

System Requirements

  • CPU: 4+ cores (modern multi-core processor recommended)
  • RAM: 16GB minimum (32GB+ recommended, especially for newer versions)
  • GPU:
    • NVIDIA GPU with 8GB+ VRAM for SD 1.x and 2.x models
    • NVIDIA GPU with 12GB+ VRAM for SDXL models
    • NVIDIA GPU with 24GB+ VRAM recommended for SD 3.x Large models
    • SD 3.5 Medium requires approximately 10GB VRAM
  • Storage: 20GB+ for base models and dependencies (100GB+ recommended for multiple models)
  • OS: Windows, Linux, or macOS with proper GPU drivers

Installation Guide

There are multiple ways to run Stable Diffusion locally. Here are the most common approaches:

Option 1: Using Stable Diffusion WebUI (AUTOMATIC1111)

This is the most popular and user-friendly way to run Stable Diffusion locally.

  1. Clone the repository:

    git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
    
  2. Navigate to the directory and run the appropriate script:

    For Windows:

    cd stable-diffusion-webui
    webui-user.bat
    

    For Linux/macOS:

    cd stable-diffusion-webui
    bash webui.sh
    
  3. The script will automatically install required dependencies and download a default model.

  4. Access the WebUI at http://localhost:7860 in your browser.

Option 2: Using ComfyUI

ComfyUI offers a more customizable, node-based approach to running Stable Diffusion.

  1. Clone the repository:

    git clone https://github.com/comfyanonymous/ComfyUI.git
    
  2. Navigate to the directory:

    cd ComfyUI
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Run ComfyUI:

    python main.py
    
  5. Access the interface at http://localhost:8188 in your browser.

Option 3: Direct API Usage with Diffusers Library

For developers who prefer programmatic access:

  1. Install the Diffusers library:

    pip install diffusers transformers accelerate safetensors
    
  2. Create a Python script to generate images:

    import torch
    from diffusers import StableDiffusionPipeline
    
    model_id = "runwayml/stable-diffusion-v1-5"  # Choose your preferred model
    
    pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
    pipe = pipe.to("cuda")
    
    prompt = "A serene landscape with mountains and a lake at sunset, highly detailed"
    
    image = pipe(prompt).images[0]
    image.save("generated_image.png")
    

Practical Exercise: Creating Images with Stable Diffusion

Step 1: Basic Text-to-Image Generation

Once you have Stable Diffusion installed, try generating your first image:

  1. If using Stable Diffusion WebUI, navigate to the “txt2img” tab.

  2. Enter a detailed prompt in the “Prompt” field, for example:

    A beautiful fantasy landscape with floating islands, waterfalls, and ancient ruins, lush vegetation, magical atmosphere, golden hour lighting, highly detailed, 8k resolution
    
  3. For better results, add a negative prompt:

    blurry, low quality, distorted, deformed, ugly, bad anatomy
    
  4. Set initial generation parameters:

    • Sampling method: DPM++ 2M Karras
    • Sampling steps: 30
    • Width/Height: 512x512 (or 1024x1024 for SDXL/SD3)
    • CFG Scale: 7.5
    • Seed: -1 (random)
  5. Click “Generate” to create your image.

Step 2: Refining Your Results

Experiment with different settings to improve your results:

  1. Adjust CFG Scale:

    • Lower values (5-7): More creative but potentially less aligned with prompt
    • Higher values (8-12): More literal interpretation of prompt
  2. Try Different Samplers:

    • DPM++ 2M Karras: Good overall quality
    • Euler a: Fast with good results
    • DDIM: Good for consistency with seeds
  3. Adjust Sampling Steps:

    • 20-30 steps is usually sufficient
    • More steps (40+) for more detailed images but diminishing returns

Step 3: Advanced Techniques

Once you’re comfortable with basic generation, try these advanced features:

  1. Image-to-Image: Upload an existing image and transform it based on your prompt

  2. Inpainting: Selectively edit parts of an image by masking areas

  3. Using ControlNet: Guide generation with edges, poses, or depth maps

  4. Prompt Engineering: Experiment with different prompt structures:

    (subject:1.2), setting, lighting, style, (details:1.1), quality modifiers
    

Resources

Official Resources

Community Tools

Learning Resources

Model Versions

  • Stable Diffusion 1.x: The original models, good for general use
  • Stable Diffusion 2.x: Improved with better text understanding
  • Stable Diffusion XL: Higher resolution with better quality
  • Stable Diffusion 3.x/3.5: Latest architecture with superior quality and prompt adherence

Suggested Projects

You might also be interested in these similar projects:

🖼️

Fooocus

A user-friendly image generation platform based on Stable Diffusion XL with Midjourney-like simplicity

Difficulty: Beginner
Updated: Mar 1, 2025

A powerful node-based interface for Stable Diffusion image generation workflows

Difficulty: Intermediate
Updated: Mar 1, 2025

An optimized Stable Diffusion WebUI with improved performance, reduced VRAM usage, and advanced features

Difficulty: Beginner
Updated: Mar 23, 2025