Hao Fei

Senior Research Fellow

School of Computing, National University of Singapore
3 Research Link, Singapore 117602

Profile

I am a senior research fellow at National University of Singapore, working with Prof. Mong-Li Lee and Prof. Wynne Hsu at IDS, also with Prof. Tat-Seng Chua at NExT++ as well as Prof. Shuicheng Yan. Previously, I was an associate researcher at Skywork AI Singapore, and SEA AI lab, respectively. I graduated as Ph.D from Wuhan University.

My research has been published in top-tier ML/NLP/CV/MM venues, e.g., ICML, NeurIPS, ACL, CVPR, AAAI, WWW, SIGIR, IJCAI, EMNLP, ACM-MM, TPAMI, TKDE, TOIS, TNNLS, TASLP. I was awarded the World AI Conference Rising Star in 2023. My papers were selected as Most Influential Papers by Paper Digest, and ESI Highly Influential Papers and 2024 WAIC Outstanding Paper Award. I was also the recipient of the 2023 WAIC Rising Star award, and ranked as Top 2% Scientists Worldwide 2024 (Single Year) by Stanford University. I’ve regularly served as (Senior) Area Chair or Senior Program Committee of top-tier conferences. I was the organization committee of WSDM 2022, EMNLP 2023, ACL 2024, ACM MM 2025. I serve as the Associate Editor of some journals, including TALLIP and Neurocomputing. And I am a persistently-invited reviewer for many journals including TPAMI, IJCV, TNNLS, TKDE, TOIS, etc. My Ph.D thesis was awarded the Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS). I won more than ten honors and awards during Ph.D stage.

Research

My research interests lie in the NLP, CV, and the intersection of both (i.e., Multimodal/Vision-Language Learning). My long-term goal is to achieve human-level AI centering around multimodal LLMs & generalists. While previously I worked a lot on the topic of Structural Modeling of Language&Vision, I pay the most recent focus on the unified multimodal generalist towards human-level capacity (Modality, Task, Knowledge) and cognition (Reasoning, Affection), with following key topics and representative works (detailed in research statement):

▶ Multimodal Foundation Models: Unified multimodal LLMs and generalists.

NExT-GPT: The 1st unified any-to-any multimodal LLM
Vitron: The 1st unified pixel-level vision LLM for understanding, generating, segmenting, editing of image and video
General-Level: Pioneer the path of MLLM evaluations towards multimodal generalists
MLLM tutorial: A pioneering & comprehensive tutorial series for MLLM techniques

▶ Capacity: Comprehension/generation of modalities/tasks, knowledge acquisition.

JavisDiT: A novel Diffusion Transformer for synchronized audio-video generation
Any2Caption: A SoTA video generation framework from any input conditions
Dysen-VDM: Enhance temporal dynamics of text-to-video diffusion from LLMs
LayoutLLM-T2I: Enhance fidelity of text-to-image diffusion with layout from LLMs
MUIE: The 1st benchmark for grounded multimodal universal information extraction

▶ Cognition: Cross-modal neuro-symbolic reasoning, human-centric affective computing.

MCoT-Survey: The 1st systematic survey of MCoT reasoning
Video-of-Thought: The 1st video chain-of-thought reasoning framework
SymbCoT: The 1st fully LLM-based logical reasoning framework based on chain-of-thought
THOR-ISA: The 1st chain-of-thought reasoning framework for implicit sentiment analysis
PanoSent: The 1st cognitive-level benchmark for multimodal conversational aspect-based sentiment analysis
AvaMERG: The 1st avatar-based multimodal empathetic conversation benchmark

I also extensively explore the AI for medicine, healthcare, clinical psychology, and social studies, by integrating the advanced LLM/agent methodologies.

Advertising

I am constantly looking for collaborations on the above topics. Remote manner is also supported. For promising students I will provide sufficient GPUs. Hit me up, if you are a Ph.D/master/bachelor student and interested in what I am doing now. When you are from Chinese universities, there are also potential vacancies for research interns (e.g., self-/CSC-funded joint PhD project). Please describe your research status and attach your resume.

News

• 16 July 2025

Three papers are accepted by ACM MM 2025, 1) SSM for Salient Object Detection, 2) ViTCoT and 3) MCM-DPO. Congrats to all my co-authors!

• 26 June 2025

Four papers are accepted by ICCV 2025, 1) PhysSplat, 2) Explainable Driving, 3) Derm1M: Clinical Ontology Knowledge and 4) Iris: Self-Refining for GUI Agent. Congrats to all my co-authors!

• 20 May 2025

We will give a tutorial at CVPR 2025 on June 11th, on the hot topic of MLLMs: Evaluations and Benchmarks. Please stay tuned to the program and welcome on-site or online attendance.

• 16 May 2025

Three papers are accepted by ACL (Main) 2025, 1) Aristotle: Logical Reasoning, 2) Metaphor Detection and 3) Cross-Lingual and Cross-Modal Hallucination Benchmark. Congrats to all my co-authors!

• 9 May 2025

We are excited to release the project On Path to Multimodal Generalist: General-Level and General-Bench, where we present 1) 🚀 General-Level, a novel 5-level evaluation framework for multimodal generalists (multimodal LLMs/agents) by assessing the level of synergy; 2) 🍕 General-Bench, a companion super massive (325K) multimodal benchmark encompasses a broader spectrum of skills, modalities, formats, and capabilities. Welcome submissions to our Leaderboards!

• 1 May 2025

Three papers are accepted by ICML 2025, 1) Path to Multimodal Generalist (Spotlight!), 2) VistaDPO: Video-DPO and 3) Privacy Memorization in MLLM. Congrats to all my co-authors!

• 10 Apr 2025