Streamlit
Create interactive data apps with pure Python in minutes
Alternative To
- • Gradio
- • Dash
- • Flask
- • Shiny
Difficulty Level
Suitable for users with basic technical knowledge. Easy to set up and use.
Overview
Streamlit is an open-source Python framework that allows data scientists and engineers to create interactive web applications directly from Python scripts. With its simple, declarative syntax, Streamlit enables developers to transform data analysis scripts into fully functional web applications without requiring any front-end experience.
The framework follows a “script runs from top to bottom” philosophy, making it intuitive for Python users and allowing rapid iteration. As you modify your code, Streamlit automatically updates the web application, creating a seamless development experience. This approach has made Streamlit especially popular among data professionals who want to share insights, build dashboards, and create interactive tools without getting bogged down in web development details.
System Requirements
- Python: 3.8 or higher
- CPU: 2+ cores (4+ recommended for data-intensive applications)
- RAM: 4GB+ (8GB+ recommended)
- GPU: Not required (useful for ML models)
- Storage: 1GB+ for base installation
- Operating System: Windows, macOS, or Linux
Installation Guide
Prerequisites
- Python 3.8 or higher
- Pip package manager
Basic Installation
Install Streamlit using pip:
pip install streamlit
To verify the installation and see a demo app:
streamlit hello
This will open a browser window with Streamlit’s demo application.
Installation in a Virtual Environment
For a more isolated environment:
# Create a virtual environment
python -m venv streamlit-env
# Activate on Windows
streamlit-env\Scripts\activate
# Activate on macOS/Linux
source streamlit-env/bin/activate
# Install Streamlit
pip install streamlit
Installation with Conda
If you’re using Anaconda or Miniconda:
# Create a new conda environment
conda create -n streamlit-env python=3.10
# Activate the environment
conda activate streamlit-env
# Install Streamlit
pip install streamlit
Practical Exercise: Building a Data Explorer
Let’s create a simple data explorer application that allows users to upload CSV files and perform basic exploratory data analysis:
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set page configuration
st.set_page_config(
page_title="Data Explorer",
page_icon="📊",
layout="wide"
)
# Add a title and description
st.title("📊 Data Explorer")
st.markdown("Upload your CSV file to explore and visualize your data.")
# File uploader
uploaded_file = st.file_uploader("Choose a CSV file", type=["csv"])
# If file is uploaded
if uploaded_file is not None:
# Load the data
try:
df = pd.read_csv(uploaded_file)
# Show success message
st.success(f"Successfully loaded data with {df.shape[0]} rows and {df.shape[1]} columns.")
# Basic data information
st.header("Data Overview")
# Display tabs for different data views
tab1, tab2, tab3 = st.tabs(["Data Preview", "Data Statistics", "Data Types"])
with tab1:
# Display data preview
st.subheader("Data Preview")
st.dataframe(df.head(10))
with tab2:
# Display descriptive statistics
st.subheader("Descriptive Statistics")
st.dataframe(df.describe())
with tab3:
# Display data types
st.subheader("Data Types")
dtypes_df = pd.DataFrame({
'Column': df.columns,
'Data Type': df.dtypes.astype(str),
'Non-Null Count': df.count().values,
'Null Count': df.isna().sum().values,
'Null Percentage': (df.isna().sum() / len(df) * 100).round(2).astype(str) + '%'
})
st.dataframe(dtypes_df)
# Visualization section
st.header("Data Visualization")
# Sidebar for visualization options
st.sidebar.header("Visualization Options")
# Select columns for visualization
numeric_cols = df.select_dtypes(include=['int64', 'float64']).columns.tolist()
categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
if len(numeric_cols) > 0:
st.sidebar.subheader("Numeric Column Analysis")
selected_num_col = st.sidebar.selectbox("Select a numeric column:", numeric_cols)
# Distribution plot
st.subheader(f"Distribution of {selected_num_col}")
fig, ax = plt.subplots(figsize=(10, 6))
sns.histplot(df[selected_num_col].dropna(), kde=True, ax=ax)
st.pyplot(fig)
# Box plot
st.subheader(f"Box Plot of {selected_num_col}")
fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(x=df[selected_num_col].dropna(), ax=ax)
st.pyplot(fig)
if len(numeric_cols) >= 2:
st.sidebar.subheader("Correlation Analysis")
x_col = st.sidebar.selectbox("Select X column:", numeric_cols, key="x_col")
y_col = st.sidebar.selectbox("Select Y column:", [c for c in numeric_cols if c != x_col], key="y_col")
# Scatter plot
st.subheader(f"Scatter Plot: {x_col} vs {y_col}")
fig, ax = plt.subplots(figsize=(10, 6))
sns.scatterplot(x=df[x_col], y=df[y_col], ax=ax)
st.pyplot(fig)
if len(categorical_cols) > 0 and len(numeric_cols) > 0:
st.sidebar.subheader("Category Analysis")
cat_col = st.sidebar.selectbox("Select category column:", categorical_cols)
num_col = st.sidebar.selectbox("Select numeric column:", numeric_cols, key="num_for_cat")
# Check if categorical column has a reasonable number of categories
unique_cats = df[cat_col].nunique()
if unique_cats <= 10: # Only show if fewer than 10 categories
# Bar plot
st.subheader(f"Bar Plot: Average {num_col} by {cat_col}")
fig, ax = plt.subplots(figsize=(12, 6))
df.groupby(cat_col)[num_col].mean().sort_values().plot(kind='bar', ax=ax)
st.pyplot(fig)
# Box plot grouped
st.subheader(f"Box Plot: {num_col} by {cat_col}")
fig, ax = plt.subplots(figsize=(12, 6))
sns.boxplot(x=cat_col, y=num_col, data=df, ax=ax)
plt.xticks(rotation=45)
st.pyplot(fig)
except Exception as e:
st.error(f"Error: {e}")
else:
# Show example datasets
st.info("No file uploaded. You can use one of the example datasets below:")
example_datasets = {
"Iris Dataset": "https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv",
"Titanic Dataset": "https://raw.githubusercontent.com/datasciencedojo/datasets/master/titanic.csv",
"Boston Housing": "https://raw.githubusercontent.com/selva86/datasets/master/BostonHousing.csv"
}
selected_example = st.selectbox("Select an example dataset:", list(example_datasets.keys()))
if st.button("Load Example Dataset"):
with st.spinner("Loading example dataset..."):
df = pd.read_csv(example_datasets[selected_example])
st.success(f"Successfully loaded {selected_example} with {df.shape[0]} rows and {df.shape[1]} columns.")
st.dataframe(df.head(10))
Advanced Example: Building a Machine Learning App
Here’s a more advanced example that creates an interactive machine learning app for classification tasks:
import streamlit as st
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.preprocessing import LabelEncoder
import plotly.express as px
# Set page configuration
st.set_page_config(
page_title="ML Classification App",
page_icon="🤖",
layout="wide"
)
# Page title and description
st.title("🤖 Interactive Machine Learning App")
st.markdown("""
This app allows you to train a machine learning model on your data and visualize the results.
Upload your CSV file, select the target variable and features, and let the app do the rest!
""")
# Cache functions to improve performance
@st.cache_data
def load_data(file):
return pd.read_csv(file)
# Sidebar for ML settings
st.sidebar.header("Model Settings")
# Upload data
uploaded_file = st.sidebar.file_uploader("Upload CSV file", type=["csv"])
# Main section
if uploaded_file is not None:
# Load data
df = load_data(uploaded_file)
# Show data overview
st.subheader("Data Preview")
st.dataframe(df.head())
# Data preprocessing
st.subheader("Data Preprocessing")
# Get list of columns
columns = df.columns.tolist()
# Select target variable
target_column = st.selectbox("Select target variable (categorical):", columns)
# Ensure target is categorical
if df[target_column].nunique() > 10:
st.warning(f"Warning: The selected target has {df[target_column].nunique()} unique values. Classification works best with fewer classes.")
# Encode target variable if necessary
if df[target_column].dtype == 'object':
le = LabelEncoder()
df[f"{target_column}_encoded"] = le.fit_transform(df[target_column])
st.info(f"Target variable '{target_column}' has been encoded for modeling.")
target_classes = dict(zip(le.transform(le.classes_), le.classes_))
st.write("Encoding mapping:", target_classes)
target_column_for_model = f"{target_column}_encoded"
else:
target_column_for_model = target_column
# Select features
feature_columns = [col for col in columns if col != target_column]
selected_features = st.multiselect("Select features for training:", feature_columns, default=feature_columns[:min(5, len(feature_columns))])
# Only continue if features are selected
if len(selected_features) > 0:
# Feature preprocessing
numeric_features = df[selected_features].select_dtypes(include=['int64', 'float64']).columns.tolist()
categorical_features = df[selected_features].select_dtypes(include=['object']).columns.tolist()
# Handle categorical features
df_processed = df.copy()
if len(categorical_features) > 0:
st.subheader("Categorical Feature Encoding")
for cat_feat in categorical_features:
# One-hot encode
df_encoded = pd.get_dummies(df[cat_feat], prefix=cat_feat)
# Add to processed dataframe
df_processed = pd.concat([df_processed, df_encoded], axis=1)
# Remove original column
df_processed.drop(cat_feat, axis=1, inplace=True)
st.success(f"Encoded {len(categorical_features)} categorical features using one-hot encoding.")
# Get final feature set
X_columns = [col for col in df_processed.columns if col != target_column and col != target_column_for_model]
# Display correlation matrix for numeric features
if len(numeric_features) > 1:
st.subheader("Feature Correlation")
corr = df[numeric_features].corr()
fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(corr, annot=True, cmap='coolwarm', ax=ax)
st.pyplot(fig)
# Model training settings
st.sidebar.subheader("Training Settings")
test_size = st.sidebar.slider("Test set size", 0.1, 0.5, 0.2, 0.05)
random_state = st.sidebar.slider("Random state", 0, 100, 42)
n_estimators = st.sidebar.slider("Number of trees", 10, 500, 100, 10)
# Split data
X = df_processed[X_columns]
y = df_processed[target_column_for_model]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=test_size, random_state=random_state)
# Model training
train_button = st.button("Train Model")
if train_button:
with st.spinner("Training model..."):
# Train model
model = RandomForestClassifier(n_estimators=n_estimators, random_state=random_state)
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Evaluate model
st.subheader("Model Performance")
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
st.metric("Accuracy", f"{accuracy:.4f}")
# Classification report
report = classification_report(y_test, y_pred, output_dict=True)
df_report = pd.DataFrame(report).transpose()
st.dataframe(df_report)
# Confusion matrix
st.subheader("Confusion Matrix")
cm = confusion_matrix(y_test, y_pred)
# If we have the original class names, use them
if df[target_column].dtype == 'object':
class_names = le.classes_
else:
class_names = [str(i) for i in range(len(np.unique(y)))]
# Plot confusion matrix
fig, ax = plt.subplots(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names, ax=ax)
plt.ylabel('Actual')
plt.xlabel('Predicted')
st.pyplot(fig)
# Feature importance
st.subheader("Feature Importance")
feature_importance = pd.DataFrame({
'Feature': X_columns,
'Importance': model.feature_importances_
}).sort_values('Importance', ascending=False)
fig = px.bar(feature_importance, x='Importance', y='Feature', orientation='h',
title='Feature Importance')
st.plotly_chart(fig)
# Allow model download (pickle)
import pickle
model_pickle = pickle.dumps(model)
st.download_button(
label="Download trained model",
data=model_pickle,
file_name="random_forest_model.pkl",
mime="application/octet-stream"
)
else:
st.warning("Please select at least one feature to train the model.")
else:
st.info("Please upload a CSV file to get started.")
# Show example usage
st.subheader("Example Usage")
st.markdown("""
1. Upload a CSV file with your data
2. Select the target variable (what you want to predict)
3. Select the features to use for prediction
4. Adjust model parameters in the sidebar
5. Click 'Train Model' to see the results
""")
# Sample datasets
st.subheader("Sample Datasets")
st.markdown("""
- [Iris Dataset](https://archive.ics.uci.edu/ml/datasets/iris) (Classification)
- [Wine Quality Dataset](https://archive.ics.uci.edu/ml/datasets/wine+quality) (Classification)
- [Breast Cancer Wisconsin Dataset](https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic)) (Classification)
""")
Key Features
Streamlit provides numerous features that make it ideal for data applications:
- Pure Python Development: Build complete web applications without HTML, CSS, or JavaScript
- Live Reloading: Changes automatically reflect in the app when you save your script
- Rich Widget Library: Extensive collection of UI components for user input and interaction
- Data Visualization Support: Native integration with popular plotting libraries
- Caching Mechanism: Performance optimization for data-heavy applications
- Layout Options: Columns, tabs, expandable sections, and sidebar for UI organization
- File Uploads and Downloads: Easy handling of file operations
- Session State: Persistent state management across reruns
- Multi-page Applications: Support for building applications with multiple pages
- Component Ecosystem: Extensible with custom components from the community
- Theme Customization: Configurable appearance and branding
- Authentication: User authentication capabilities for secure applications
- Cloud Deployment: Free hosting for public apps via Streamlit Community Cloud
Resources
Official Resources
- Streamlit Documentation
- GitHub Repository
- Streamlit Community Cloud - Free hosting platform
- Streamlit Gallery - Example applications
- Streamlit Blog - Tutorials and updates
Community Resources
- Streamlit Components - Community extensions
- Streamlit Forum - Community discussions
- Streamlit Cheat Sheet - Quick reference
- Streamlit YouTube Channel - Video tutorials
- Awesome Streamlit - Curated resources
Suggested Projects
You might also be interested in these similar projects:
Self-host Supervision, a Python library with reusable computer vision tools for easy annotation, detection, tracking, and dataset management