Supervision

Self-host Supervision, a Python library with reusable computer vision tools for easy annotation, detection, tracking, and dataset management

Beginner open-source self-hosted python object-detection annotation

GitHub Repository Official Website

Alternative To

• Google Cloud Vision
• AWS Rekognition

Difficulty Level

Beginner

Suitable for users with basic technical knowledge. Easy to set up and use.

Overview

Supervision is an open-source, model-agnostic Python library by Roboflow that provides reusable computer vision tools. Whether you need to load datasets, annotate images or videos, track objects, count detections in specific zones, or convert between annotation formats, Supervision offers a suite of utilities to simplify computer vision workflows.

System Requirements

Python: 3.8 or higher
CPU: 2+ cores
RAM: 4GB+
GPU: Optional, but recommended for running detection models
Storage: 1GB+ for the library and dependencies

Installation Guide

Prerequisites

Python 3.8 or higher installed on your system
pip package manager
(Optional) Virtual environment like venv or conda

Standard Installation

The easiest way to install Supervision is using pip:

pip install supervision

For GPU-accelerated annotations and visualizations, you’ll need additional dependencies:

pip install supervision[desktop]

Installation from Source

If you need the latest development version or want to contribute:

# Clone the repository
git clone https://github.com/roboflow/supervision.git
cd supervision

# Setup Python environment (optional but recommended)
python -m venv venv
source venv/activate  # On Windows: venv\Scripts\activate

# Install the package
pip install -e "."

Practical Exercise: Getting Started with Supervision

Let’s create a simple object detection visualization pipeline using Supervision and YOLOv8.

Step 1: Install Required Packages

pip install supervision ultralytics

Step 2: Create a Simple Detection and Visualization Script

Create a file named detect_and_visualize.py with the following code:

import cv2
import supervision as sv
from ultralytics import YOLO

# Load a pre-trained YOLOv8 model
model = YOLO("yolov8n.pt")

# Load an image
image_path = "path/to/your/image.jpg"  # Replace with your image
image = cv2.imread(image_path)

# Run inference
results = model(image)[0]

# Convert results to Supervision's Detections format
detections = sv.Detections.from_ultralytics(results)

# Create a box annotator
box_annotator = sv.BoxAnnotator()

# Annotate the image
annotated_image = box_annotator.annotate(
    scene=image.copy(),
    detections=detections
)

# Save the annotated image
cv2.imwrite("annotated_image.jpg", annotated_image)

# Display the annotated image
cv2.imshow("Detections", annotated_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Step 3: Expanding to Video Processing

Create another file named process_video.py:

import cv2
import supervision as sv
from ultralytics import YOLO

# Load a pre-trained YOLOv8 model
model = YOLO("yolov8n.pt")

# Open video
video_path = "path/to/your/video.mp4"  # Replace with your video
video_info = sv.VideoInfo.from_video_path(video_path)
frame_generator = sv.get_video_frames_generator(video_path)

# Create annotators
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()

# Create video writer
with sv.VideoSink(target_path="annotated_video.mp4", video_info=video_info) as sink:
    # Process each frame
    for frame in frame_generator:
        # Run inference
        results = model(frame)[0]
        detections = sv.Detections.from_ultralytics(results)

        # Filter detections if needed (e.g., by confidence)
        detections = detections[detections.confidence > 0.5]

        # Annotate the frame
        annotated_frame = box_annotator.annotate(scene=frame.copy(), detections=detections)

        # Add to video
        sink.write_frame(annotated_frame)

Step 4: Exploring Advanced Features

Supervision offers many more features like:

Object Tracking:
- Combine Supervision with ByteTrack or other trackers to track objects across video frames
Zone Counting:
- Define zones and count objects crossing lines or entering/exiting areas
Dataset Management:
- Convert between annotation formats (COCO, YOLO, Pascal VOC)
- Split datasets for training/validation
Custom Visualizations:
- Create custom annotation styles for different detection types