Text Generation WebUI
Self-host Text Generation WebUI, a powerful Gradio web interface for running large language models locally with multiple inference backends
Alternative To
- • ChatGPT
- • Claude
- • OpenAI API
Difficulty Level
Requires some technical experience. Moderate setup complexity.
Overview
Text Generation WebUI is an open-source Gradio-based web interface for running Large Language Models (LLMs) locally. Created by oobabooga, it supports multiple text generation backends including Transformers, llama.cpp, and ExLlamaV2. The interface provides multiple chat modes, model fine-tuning capabilities, and an OpenAI-compatible API for seamless integration with other applications.
System Requirements
- CPU: 4+ cores (8+ cores recommended for larger models)
- RAM: 16GB+ (32GB+ recommended for larger models)
- GPU:
- NVIDIA GPU with 8GB+ VRAM for smaller models (7B parameters)
- NVIDIA GPU with 24GB+ VRAM for larger models (up to 13B parameters)
- Multiple GPUs or high-end GPUs for very large models (30B+ parameters)
- Storage: 20GB+ (100GB+ recommended for storing multiple models)
Installation Guide
Prerequisites
- Python 3.8 or higher
- Git
- NVIDIA CUDA drivers if using an NVIDIA GPU
- Conda/Miniconda (recommended for environment management)
Option 1: One-Click Installers (Simplest)
For Windows, macOS, and Linux users, oobabooga offers one-click installers that automate the entire setup process:
Download the appropriate installer for your system from: One-Click Installers Repository
Run the installer and follow the on-screen instructions.
Once installed, the interface will be accessible at:
http://localhost:7860
Option 2: Manual Installation
Clone the repository:
git clone https://github.com/oobabooga/text-generation-webui cd text-generation-webuiCreate and activate a conda environment:
conda create -n textgen python=3.10 conda activate textgenInstall the dependencies based on your system:
# For NVIDIA GPUs pip install -r requirements.txt # For CPU-only systems pip install -r requirements_cpu_only.txtStart the web UI:
# Basic startup python server.py # Or with additional parameters python server.py --listen --chat
Option 3: Docker Installation
Pull the Docker image:
docker pull atinoda/text-generation-webui:default-nvidiaRun the container:
docker run -it --rm -e EXTRA_LAUNCH_ARGS="--listen --verbose" --gpus all -p 7860:7860 atinoda/text-generation-webui:default-nvidia
Practical Exercise: Running Your First LLM
Let’s set up and run a language model with Text Generation WebUI.
Step 1: Download a Model
After installing Text Generation WebUI, you’ll need to download a model:
Start the web UI:
python server.pyOpen your browser and navigate to
http://localhost:7860Go to the “Model” tab
Select a model repository like “TheBloke” and choose a model. For beginners, try one of these smaller models:
- TheBloke/Llama-2-7B-Chat-GGUF
- TheBloke/Mistral-7B-Instruct-v0.2-GGUF
Select a quantization level based on your GPU VRAM (4-bit for lower VRAM, 8-bit for more VRAM)
Click “Download” and wait for the model to download
Step 2: Load the Model
Once downloaded, select your model from the “Model” dropdown
Set appropriate parameters:
- Context length: 2048 (or higher if your model supports it)
- n_gpu_layers: Set to -1 to offload all layers to GPU
- n_batch: 512 (reduce if you encounter memory issues)
Click “Load” to load the model into memory
Step 3: Chat with the Model
Navigate to the “Chat” tab
Select “Chat-Instruct” mode from the dropdown for instruction-following models
Start a conversation by typing a message and pressing Enter
Experiment with different parameters in the “Generation” section to control output:
- Temperature: Controls randomness (lower for more deterministic responses)
- Top P: Controls diversity of responses
- Max new tokens: Controls maximum response length
Step 4: Exploring Advanced Features
Once you’re comfortable with the basics, try these advanced features:
Character Creation:
- Create custom characters with specific personalities
- Define custom chat templates and formats
Extensions:
- Enable extensions like API, Memory, or Image Generation from the “Extensions” tab
Notebook Mode:
- Use the “Default” tab for free-form text generation without chat context
API Integration:
- Enable the API extension to integrate with other applications
- Use the OpenAI-compatible endpoint at
http://localhost:5000/v1
Resources
Official Documentation
The GitHub repository contains detailed instructions and documentation:
Community Support
Join the community to get help, share experiences, and contribute to the project:
Extensions
Explore available extensions to enhance functionality:
Text Generation WebUI Extensions
Tutorials and Guides
Community tutorials for various aspects of the project:
Suggested Projects
You might also be interested in these similar projects:
A natural language interface that lets LLMs run code on your computer
Framework for developing context-aware applications powered by large language models (LLMs)
CrewAI is a standalone Python framework for orchestrating role-playing, autonomous AI agents that collaborate intelligently to tackle complex tasks through defined roles, tools, and workflows.