Hao Fei

Research Fellow

School of Computing, National University of Singapore
3 Research Link, Singapore 117602

Profile

I am a research fellow at National University of Singapore, working with Prof. Mong-Li Lee and Prof. Wynne Hsu at IDS, also partially with Prof. Tat-Seng Chua at NExT++. Previously, I was an associate researcher at Skywork AI Singapore, working with Prof. Shuicheng Yan (more previously an associate researcher at SEA AI lab). I graduated as Ph.D from Wuhan University.

My research has been published in top-tier ML/NLP/CV/MM venues, e.g., ICML, NeurIPS, ACL, CVPR, AAAI, WWW, SIGIR, IJCAI, EMNLP, ACM-MM, TPAMI, TKDE, TOIS, TNNLS, TASLP. I was awarded the World AI Conference Rising Star in 2023. His papers were selected as Most Influential Papers by Paper Digest, and ESI Highly Influential Papers and 2024 WAIC Outstanding Paper Award. I was also the recipient of the 2023 WAIC Rising Star award, and ranked as Top 2% Scientists Worldwide 2024 (Single Year) by Stanford University. I’ve regularly served as (Senior) Area Chair or Senior Program Committee of top-tier conferences. I was the organization committee of WSDM 2022, EMNLP 2023, ACL 2024. I serve as the Associate Editor of some journals, including TALLIP and Neurocomputing. And I am a persistently-invited reviewer for many journals including TPAMI, IJCV, TNNLS, TKDE, TOIS, etc. My Ph.D thesis was awarded the Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS). I won more than ten honors and awards when I was in Ph.D stage.

Research

My research interests lie in the NLP, CV, and the intersection of both (i.e., Multimodal/Vision-Language Learning). My long-term goal is to achieve human-level AI centering around multimodal LLMs & generalists. While previously I worked a lot on the topic of Structural Modeling of Language&Vision, I pay the most recent focus on the unified multimodal generalist towards human-level capacity (Modality, Task, Knowledge) and cognition (Reasoning, Affection), with following key topics and representative works (detailed in research statement):

▶  Multimodal Foundation Models: Unified multimodal LLMs and generalists.

  • NExT-GPT:      The 1st unified any-to-any multimodal LLM
  • Vitron:      The 1st unified pixel-level vision LLM for understanding, generating, segmenting, editing of image and video
  • General-Level:      Pioneer the path of MLLM evaluations towards multimodal generalists
  • MLLM tutorial:      A pioneering tutorial series for MLLM techniques

▶  Capacity: Perception/generation of modalities/tasks, knowledge acquisition/information extraction.

  • Dysen-VDM:      Enhance temporal dynamics of text-to-video diffusion from LLMs
  • LayoutLLM-T2I:      Enhance fidelity of text-to-image diffusion with layout from LLMs
  • Finsta:      Enhance VLMs with a fine-grained structural spatio-temporal alignment learning
  • MUIE:      The 1st benchmark for grounded multimodal universal information extraction

▶  Cognition: Cross-modal complex neuro/symbolic reasoning and human-centric affective computing.

  • Video-of-Thought:      The 1st video chain-of-thought reasoning framework
  • SymbCoT:      The 1st fully LLM-based logical reasoning framework based on chain-of-thought
  • THOR-ISA:      The 1st chain-of-thought reasoning framework for implicit sentiment analysis
  • PanoSent:      The 1st cognitive-level benchmark for multimodal conversational aspect-based sentiment analysis

Advertising

I am constantly looking for collaborations on the above topics. Remote manner is also supported. For promising students I will provide sufficient GPUs. Hit me up, if you are a Ph.D/master/bachelor student and interested in what I am doing now. When you are from Chinese universities, there are also potential vacancies for research interns (e.g., self-/CSC-funded joint PhD project). Please describe your research status and attach your resume.

News

  2 Nov 2024

The tutorial video record of Multimodal LLM at ACM MM 2024 are released at Youtube; all slides and materials are available at homepage.

  25 Oct 2024

We will give a tutorial at ACM MM 2024 on Monday 28 Oct 9:00-12:30, on the hot topic of MLLMs: Architecture, Modality, Function, Instruction, Hallucination, Evaluation, Reasoning and Beyond. Please stay tuned to the program and welcome on-site or online attendance.

  26 Sep 2024

Eight papers are accepted by NeurIPS 2024, all about Multimodal LLMs and Learnings. Congrats to all my co-authors!

  20 Sep 2024

Three papers are accepted by EMNLP 2024 (Main/Findings), 1) Commonsense Reasoning, 2) Legal Text Generation, and 3) Survey on Conversational Understanding. Congrats to all my co-authors!

  16 Sep 2024

Ranked as Top 2% Scientists Worldwide 2024 (Single Year) by Stanford University.

  16 July 2024

Four papers are accepted by ACM MM 2024, 1) Multimodal Conversational ABSA, 2) Speech Event Extraction, 3) Mutimodal Coreference Resolution and 4) Visual Programs. Congrats to all my co-authors!

  19 June 2024

Our tutorial video of Multimodal LLM at CVPR 2024 is released at Youtube; all slides and materials are available at homepage.

... see all News