Production-ready implementation of "Colorful Image Colorization" (Zhang, Isola, Efros - ECCV 2016) with interactive demos, memory-safe training, and comprehensive tooling.
- π Paper-Accurate Implementation: Classification-based colorization with 313-bin quantized ab space, class rebalancing, and annealed-mean decoding (T=0.38)
- π¬ Interactive UIs: Beautiful Streamlit and Gradio frontends with blend animations, temperature sliders, and method comparison
- π Production-Ready: Docker support, Redis caching, memory safeguards (FP16, gradient checkpointing, tiling)
- πΎ Memory-Safe: Automatic OOM handling for RTX 3060 6GB and similar GPUs
- π Multiple Methods: Classification (paper), L2 regression baseline, OpenCV color transfer
- π§ͺ Fully Tested: Unit tests, integration tests, CI/CD with GitHub Actions
- π¦ Cross-Platform: Works on Linux, Windows (WSL2), and macOS with local venv or Docker
# Clone repository
git clone https://github.com/yourusername/colorization.git
cd colorization
# Run setup script
.\scripts\setup_local.ps1
# Activate environment
.\.venv\Scripts\Activate.ps1
# Install PyTorch with CUDA 13.0
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu130
# Verify GPU
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"# Clone repository
git clone https://github.com/yourusername/colorization.git
cd colorization
# Run setup script
chmod +x scripts/*.sh
./scripts/setup_local.sh
# Activate environment
source .venv/bin/activate
# Install PyTorch with CUDA 13.0
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu130
# Verify GPU
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"# Build and run with Docker Compose
cd docker
docker-compose up --build
# Access UIs
# Streamlit: http://localhost:8501
# Gradio: http://localhost:7860Prerequisites for Docker GPU support:
- Docker Desktop (Windows) or Docker Engine (Linux)
- NVIDIA Container Toolkit (installation guide)
- NVIDIA GPU drivers
# Windows
.\scripts\run_streamlit.ps1
# Linux/macOS
./scripts/run_streamlit.sh
# Or directly
python -m streamlit run src/ui/streamlit_app.py --server.port 8501# Windows
.\scripts\run_gradio.ps1
# Linux/macOS
./scripts/run_gradio.sh
# Or directly
python src/ui/gradio_app.py --port 7860Access the UIs:
- Streamlit: http://localhost:8501
- Gradio: http://localhost:7860
# Prepare your dataset
# Structure: data/train/<images>, data/val/<images>
# Compute color statistics (optional but recommended)
python -m src.data.dataset data/train --output data/color_stats.npz
# Train with quick config
python -m src.train \
--config configs/quicktrain.yaml \
--train_dir data/train \
--val_dir data/val# For ImageNet-scale training
python -m src.train \
--config configs/fulltrain.yaml \
--train_dir /path/to/imagenet/train \
--val_dir /path/to/imagenet/valTraining Configuration:
Edit configs/quicktrain.yaml or configs/fulltrain.yaml:
model:
model_type: "mobile" # Options: paper, mobile, l2
num_classes: 313
base_channels: 32
num_epochs: 50
batch_size: 16
learning_rate: 1e-4
use_amp: true # FP16 mixed precisionMemory Management:
- Model automatically scales batch size if OOM occurs
- Use
model_type: mobilefor 6GB GPUs - Use
model_type: paperfor 12GB+ GPUs - FP16 mixed precision reduces memory by ~40%
# Single image
python -m src.infer \
input.jpg \
--output colorized.jpg \
--model checkpoints/best_model.pth \
--method classification \
--temperature 0.38
# Batch processing
python -m src.infer \
input_folder/ \
--output output_folder/ \
--model checkpoints/best_model.pth \
--method classificationfrom src.infer import ColorizationInference
from PIL import Image
# Initialize engine
engine = ColorizationInference(
model_path="checkpoints/best_model.pth",
device="cuda", # or "cpu"
use_cache=True,
redis_url="redis://localhost:6379"
)
# Colorize image
img = Image.open("grayscale.jpg")
result = engine.colorize_image(
img,
method="classification",
temperature=0.38
)
# Save result
result_img = Image.fromarray((result * 255).astype('uint8'))
result_img.save("colorized.jpg")
# Create blend animation
frames = engine.create_blend_animation(
img,
num_frames=30
)result = engine.colorize_image(img, method="classification", temperature=0.38)- Description: Treats colorization as per-pixel classification over 313 quantized ab bins
- Features: Class rebalancing, soft-encoding, annealed-mean decoding
- Temperature:
- Lower (0.1-0.3): More vibrant, saturated colors
- Default (0.38): Balanced (paper recommendation)
- Higher (0.5-1.0): More conservative, desaturated
result = engine.colorize_image(img, method="l2")- Direct regression to ab values
- Simpler but tends to produce desaturated results
- Useful for comparison
result = engine.colorize_image(img, method="opencv")- Simple baseline using traditional CV techniques
- No neural network required
- Limited quality
- π€ Image Upload: Drag & drop or browse
- π― Method Selector: Switch between classification, L2, OpenCV
- π‘οΈ Temperature Slider: Adjust color vibrancy (0.01-1.0)
- π¬ Blend Animation: Smooth grayscaleβcolor transition
- π Side-by-Side Comparison: Visual before/after
- πΎ Download Results: Save colorized images
- π System Monitor: GPU memory usage, cache stats
- All Streamlit features
- πΌοΈ Animation Gallery: View all blend frames
- ποΈ Real-time Blend Slider: Interactive color mixing
- π Shareable Links: Public demo URLs (with --share)
# Build image
docker build -t colorize-app:latest -f docker/Dockerfile .
# Run Streamlit
docker run --gpus all -p 8501:8501 colorize-app:latest
# Run Gradio
docker run --gpus all -p 7860:7860 colorize-app:latest \
python src/ui/gradio_app.py --port 7860
# Or use Docker Compose (recommended)
docker-compose up# docker-compose.yml includes:
- redis: Caching service (port 6379)
- app-streamlit: Streamlit UI (port 8501)
- app-gradio: Gradio UI (port 7860)# Run all tests
pytest src/tests/ -v
# Run with coverage
pytest src/tests/ --cov=src --cov-report=html
# Run specific test file
pytest src/tests/test_ops.py -v
# Run integration tests only
pytest src/tests/test_integration.py -vcolorization/
βββ src/
β βββ models/
β β βββ model.py # PaperNet, MobileLiteVariant, L2RegressionNet
β β βββ ops.py # Quantization, encoding, color conversion
β βββ data/
β β βββ dataset.py # Dataset loaders
β β βββ transforms.py # Data augmentation
β βββ ui/
β β βββ streamlit_app.py # Streamlit interface
β β βββ gradio_app.py # Gradio interface
β βββ cache/
β β βββ redis_client.py # Redis caching with disk fallback
β βββ utils/
β β βββ memory.py # Memory management, tiling
β β βββ logger.py # TensorBoard and file logging
β βββ tests/ # Unit and integration tests
β βββ train.py # Training script
β βββ infer.py # Inference engine
βββ configs/
β βββ quicktrain.yaml # Quick training config
β βββ fulltrain.yaml # Full paper training config
βββ docker/
β βββ Dockerfile # CUDA 13.0 Docker image
β βββ docker-compose.yml # Multi-service setup
βββ scripts/
β βββ setup_local.sh # Linux/macOS setup
β βββ setup_local.ps1 # Windows setup
β βββ run_streamlit.sh # Launch Streamlit
β βββ run_gradio.sh # Launch Gradio
β βββ verify_system.sh # System verification
βββ .github/workflows/
β βββ ci.yml # GitHub Actions CI
βββ requirements.txt # Python dependencies
βββ requirements-dev.txt # Development dependencies
βββ README.md # This file
# configs/quicktrain.yaml
model:
model_type: "mobile" # paper | mobile | l2
num_classes: 313 # Number of ab bins
base_channels: 32 # Channel multiplier (mobile/l2 only)- Optimizer: Adam (Ξ²β=0.9, Ξ²β=0.99)
- Weight Decay: 1e-3
- Learning Rate Schedule:
- Initial: 3e-5
- β 1e-5 at 200k iterations
- β 3e-6 at 375k iterations
- Batch Size: 32 (adjust for your GPU)
- Image Size: 256Γ256
- Quantization: Grid size 10 β 313 in-gamut bins
- Soft-encoding: K=5 neighbors, Ο=5
- Class Rebalancing: Ο=5, Ξ»=0.5
- Annealed-mean Temperature: T=0.38
| Model Type | GPU Memory | Batch Size | Training Speed |
|---|---|---|---|
| Mobile (32ch) | 4-6 GB | 16 | Fast |
| Mobile (64ch) | 8-10 GB | 16 | Fast |
| Paper (full) | 10-14 GB | 16 | Moderate |
| Paper (full) | 14-20 GB | 32 | Moderate |
Memory Saving Techniques:
- β
FP16 mixed precision (
use_amp: true) - β Gradient checkpointing (automatic for large models)
- β Auto batch size reduction on OOM
- β Mobile variant (fewer parameters)
| Image Size | GPU Memory | Tile Size |
|---|---|---|
| 256Γ256 | <1 GB | Not needed |
| 512Γ512 | 1-2 GB | Not needed |
| 1024Γ1024 | 3-4 GB | 512 recommended |
| 2048Γ2048 | 8+ GB | 512 required |
Use tiling for large images:
result = engine.colorize_image(img, tile_size=512)# Solution 1: Use mobile variant
model:
model_type: "mobile"
base_channels: 16 # Reduce from 32
# Solution 2: Reduce batch size
batch_size: 8 # Down from 16
# Solution 3: Enable gradient checkpointing
# (automatically enabled for paper model)
# Solution 4: Use CPU
device: "cpu"# Verify CUDA installation
nvidia-smi
# Reinstall PyTorch with correct CUDA version
pip3 uninstall torch torchvision
pip3 install torch torchvision --index-url https://download.pytorch.org/whl/cu130# Start Redis locally
# Linux/macOS
redis-server
# Windows (with WSL2)
sudo service redis-server start
# Or disable caching
# In Python:
engine = ColorizationInference(use_cache=False)# Ensure you're in the project root directory
cd /path/to/colorization
# Install in development mode
pip install -e .
# Or add to PYTHONPATH
export PYTHONPATH="${PYTHONPATH}:$(pwd)" # Linux/macOS
$env:PYTHONPATH += ";$(pwd)" # Windows PowerShell- ab space: Grid size 10 β covers [-110, 110] range
- In-gamut filtering: Only 313 of 441 bins represent valid RGB colors
- Soft-encoding: Gaussian kernel (Ο=5) on K=5 nearest neighbors
1. Compute empirical distribution p from training set
2. Smooth with Gaussian: p_smooth = GaussianFilter(p, Ο=5)
3. Mix with uniform: pΜ = (1-Ξ»)Β·p_smooth + λ·(1/Q) [Ξ»=0.5]
4. Compute weights: w = pΜ^(-1)
5. Normalize: w = w / E[w] (mean=1)
1. Apply temperature: ZΜ = Z / T
2. Compute softmax: P(Β·) = softmax(ZΜ)
3. Expected value: Γ’b = Ξ£ P(q) Β· ab(q)
Lower T β more diverse/vibrant colors
Higher T β more conservative/muted colors
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Guidelines:
- Follow PEP 8 style
- Add tests for new features
- Update documentation
- Run
blackandflake8before committing
This project is licensed under the MIT License - see the LICENSE file for details.
This implementation is based on:
Colorful Image Colorization
Richard Zhang, Phillip Isola, Alexei A. Efros
European Conference on Computer Vision (ECCV), 2016
Paper | Project Page
Original PyTorch implementation: richzhang/colorization
For questions or issues:
- Open a GitHub issue
- Email: your.email@example.com
If you use this code in your research, please cite:
@inproceedings{zhang2016colorful,
title={Colorful image colorization},
author={Zhang, Richard and Isola, Phillip and Efros, Alexei A},
booktitle={European conference on computer vision},
pages={649--666},
year={2016},
organization={Springer}
}Made with β€οΈ for reproducible deep learning research