Project Overview:

Vision Transformers (ViTs) have emerged as powerful models for computer vision tasks but often lack reliable uncertainty quantification. This project addresses this gap by developing MABViT2, a Model Agnostic Bayesian Vision Transformer, enabling uncertainty estimation without architectural modifications.

Key Contributions:

Developed a Bayesian ViT model that quantifies uncertainty without architectural changes
Implemented a variational inference approach for parameter estimation
Validated performance on benchmark datasets while maintaining competitive accuracy
Demonstrated superior out-of-distribution detection compared to deterministic baselines
Created a flexible implementation that can be applied to any existing Vision Transformer

Technical Details:

The MABViT2 framework uses a Monte Carlo Dropout method in the self-attention mechanism to approximate Bayesian inference. By running multiple forward passes with different dropout masks, the model generates a distribution of predictions that capture epistemic uncertainty. The implementation is compatible with existing ViT architectures like DeiT and ViT-B models.

Significance:

This work bridges a critical gap in computer vision by enabling uncertainty quantification in transformer-based models, which is essential for high-stakes applications like medical imaging and autonomous driving. The model-agnostic approach makes it straightforward to apply to existing systems without architectural redesign.