Ali Diba

I am a scientist and PI at the Qatar Computing Research Institute (QCRI), where my research focuses on computer vision and multi-modal foundation models.

Prior to joining QCRI, I was a Staff Research Scientist at Lunit, working on AI systems for cancer detection and prediction. I received my PhD from KU Leuven, under the supervision of Prof. Luc Van Gool, where I investigated multi-task learning for vision models on edge devices and large-scale holistic video understanding. Earlier in my career, I co-founded Sensifai, a computer vision startup dedicated to large-scale media archive monitoring.

Email  /  CV  /  Google Scholar  /  Twitter  /  Github

profile photo
Research

I'm interested in research and development in computer vision, deep neural networks, and foundation models. Currently, I'm working on multi-modal vision-language models on images, videos, and text in the humanitarian AI domain.

KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation
ICML 2026

T-SiamTPN: Temporal Siamese Transformer Pyramid Networks for Robust and Efficient UAV Tracking T-SiamTPN: Temporal Siamese Transformer Pyramid Networks for Robust and Efficient UAV Tracking
Arxiv 2025

Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models
ICCV 2025

MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction
ICCV 2025

SelectiveKD: A semi-supervised framework for cancer detection in DBT through Knowledge Distillation and Pseudo-labeling SelectiveKD: A semi-supervised framework for cancer detection in DBT through Knowledge Distillation and Pseudo-labeling
MICCAI 2025

Spatio-Temporal Convolution-Attention Video Network Spatio-Temporal Convolution-Attention Video Network
ICCV 2023

3D CNNs with Adaptive Temporal Feature Resolutions 3D CNNs with Adaptive Temporal Feature Resolutions
CVPR 2021

Vi2CLR: Video and Image for Visual Contrastive Learning of Representation Vi2CLR: Video and Image for Visual Contrastive Learning of Representation
ICCV 2021


Design and source code from Jon Barron's website.