EfficientVLP ⚡

A toolkit for efficient vision-language pre-training and fine-tuning: Token Merging, LoRA/QLoRA, Knowledge Distillation, and more.

VLMs (LLaVA, InternVL, Qwen-VL, etc.) are expensive to fine-tune and slow to deploy. EfficientVLP is a practical toolkit that combines multiple orthogonal efficiency techniques under one roof:

Technique	Speed-up	Memory ↓	Quality
Token Merging (ToMe)	1.5–2.0×	20–30%	≈ baseline
LoRA fine-tuning	—	60–80%	≈ full FT
QLoRA (4-bit)	—	85–90%	−0.5–1.5%
Flash Attention 2	1.5–3×	30–50%	identical
Knowledge Distillation	—	model-size	−2–5%

🚀 Quick Start

git clone https://github.com/suncatchin/efficient-vlp
cd efficient-vlp
pip install -e ".[full]"

Apply Token Merging to a ViT

from efficient_vlp.token_merging import patch_model, ToMeConfig

import timm
model = timm.create_model("vit_large_patch14_clip_224.openai", pretrained=True)

config = ToMeConfig(r=8)   # merge 8 token pairs per block
patch_model(model, config)

# Now model runs with ~15% fewer tokens → faster forward pass

LoRA fine-tuning

python scripts/train_lora.py \
  --model_id llava-hf/llava-v1.6-mistral-7b-hf \
  --dataset HuggingFaceM4/VQAv2 \
  --lora_rank 16 \
  --lora_alpha 32 \
  --output_dir ./checkpoints/llava-lora/

QLoRA (4-bit) fine-tuning

python scripts/train_lora.py \
  --model_id Qwen/Qwen2-VL-7B-Instruct \
  --qlora \
  --lora_rank 64 \
  --output_dir ./checkpoints/qwen2vl-qlora/

Benchmark Token Merging speed

python scripts/benchmark_tome.py \
  --model clip-vit-large-patch14 \
  --r 0 4 8 12 16 \
  --batch_size 64

📂 Project Structure

efficient_vlp/
├── token_merging/
│   ├── tome.py             # Core ToMe algorithm (bipartite matching)
│   └── merge_utils.py      # Merge / unmerge helpers
├── lora/
│   ├── lora_layer.py       # LoRA linear layer implementation
│   └── qlora.py            # 4-bit QLoRA with bitsandbytes
├── distillation/
│   └── kd_trainer.py       # Knowledge distillation training loop
├── pruning/
│   └── structured_pruner.py # Structured head/neuron pruning
└── trainer.py              # Unified training entry point
scripts/
├── train_lora.py
└── benchmark_tome.py

📊 Token Merging Benchmarks

Evaluated on ViT-L/14 (ImageNet-1k, batch=64, A100):

r (tokens merged/block)	Throughput (img/s)	Top-1 Acc.
0 (baseline)	412	75.3%
4	498 (+21%)	75.1%
8	573 (+39%)	74.8%
12	635 (+54%)	74.3%
16	682 (+66%)	73.5%

📊 LoRA Fine-tuning Benchmarks

Fine-tuned LLaVA-1.6-Mistral-7B on VQAv2 validation (A100 80GB):

Method	GPU Mem	Train Time	VQAv2 Acc.
Full FT	75GB	14h	81.4%
LoRA r=16	28GB	5h	81.0%
LoRA r=64	38GB	7h	81.3%
QLoRA r=64	18GB	8h	80.7%

⚙️ Key Configurations

from efficient_vlp.token_merging import ToMeConfig
from efficient_vlp.lora import LoRAConfig

# Token Merging
tome_cfg = ToMeConfig(
    r=8,                    # tokens merged per Transformer block
    sx=2, sy=2,             # stride for source token selection
    use_rand=True,          # random source selection (avoids bias)
    merge_attn=True,        # also merge in attention computation
)

# LoRA
lora_cfg = LoRAConfig(
    rank=16,
    alpha=32,
    dropout=0.05,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
)

📖 Citation

@misc{xu2024efficientvlp,
  title={EfficientVLP: A Practical Toolkit for Efficient Vision-Language Pre-training},
  author={Xu, Haowen},
  year={2024},
  url={https://github.com/suncatchin/efficient-vlp}
}

📄 License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
efficient_vlp		efficient_vlp
scripts		scripts
tests		tests
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EfficientVLP ⚡

🚀 Quick Start

Apply Token Merging to a ViT

LoRA fine-tuning

QLoRA (4-bit) fine-tuning

Benchmark Token Merging speed

📂 Project Structure

📊 Token Merging Benchmarks

📊 LoRA Fine-tuning Benchmarks

⚙️ Key Configurations

📖 Citation

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EfficientVLP ⚡

🚀 Quick Start

Apply Token Merging to a ViT

LoRA fine-tuning

QLoRA (4-bit) fine-tuning

Benchmark Token Merging speed

📂 Project Structure

📊 Token Merging Benchmarks

📊 LoRA Fine-tuning Benchmarks

⚙️ Key Configurations

📖 Citation

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages