Ali Diba

I am a scientist and PI at the Qatar Computing Research Institute (QCRI), where my research focuses on computer vision and multi-modal foundation models.

Prior to joining QCRI, I was a Staff Research Scientist at Lunit, working on AI systems for cancer detection and prediction. I received my PhD from KU Leuven, under the supervision of Prof. Luc Van Gool, where I investigated multi-task learning for vision models on edge devices and large-scale holistic video understanding. Earlier in my career, I co-founded Sensifai, a computer vision startup dedicated to large-scale media archive monitoring.

Email / CV / Google Scholar / Twitter / Github

Research

I'm interested in research and development in computer vision, deep neural networks, and foundation models. Currently, I'm working on multi-modal vision-language models on images, videos, and text in the humanitarian AI domain.

	KG-FairDiff: Knowledge Graph-Guided Prompt Refinement for Demographically Fair Text-to-Image Generation ICML 2026
	T-SiamTPN: Temporal Siamese Transformer Pyramid Networks for Robust and Efficient UAV Tracking Arxiv 2025
	Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models ICCV 2025
	MV-MLM: Bridging Multi-View Mammography and Language for Breast Cancer Diagnosis and Risk Prediction ICCV 2025
	SelectiveKD: A semi-supervised framework for cancer detection in DBT through Knowledge Distillation and Pseudo-labeling MICCAI 2025
	Spatio-Temporal Convolution-Attention Video Network ICCV 2023
	3D CNNs with Adaptive Temporal Feature Resolutions CVPR 2021
	Vi2CLR: Video and Image for Visual Contrastive Learning of Representation ICCV 2021
Design and source code from Jon Barron's website.