I am a post-doctoral research fellow at National University of Singapore, working with Prof. Tat-Seng Chua at NExT++ research center, and Prof. Wynne Hsu and Prof. Mong Li Lee at Institute of Data Science. I am also an associate researcher at Kunlun 2050 Research, Skywork AI Lab Singapore, working with Prof. Shuicheng Yan (previously an associate researcher at SEA AI lab). Prior to that, I received my Ph.D. degree from Wuhan University. I was an intern at Baidu Inc. when I was a bachelor.
My research direction lies in the intersection of Natural Language Processing (NLP) and Computer Vision (CV), i.e., Vision-Language Learning,
with broad-covering interests, such as Large Language Model, Text-Image/Video/Audio/3D Modeling, Cross-modal Reasoning, Information Extraction, Affective Computing, Structure Modeling.
I am apt to construct learning models, with the fundamental goal of building systems capable of human-level understanding of the world.
My ongoing research focuses on the particular angle of Structure-aware Intelligence Learning
(SAIL),
which aims at enhancing the semantics understanding of varied modalities with the intrinsic data structure modeling.
The SAIL idea works effectively for the deep learning based AI, and also holds for the current large language models (LLMs),
which will ultimately help achieve AGI of universal modalities (world modeling).
See my research statement for more details.
Also, I believe so much that the key to realizing human-level AGI lies in two fundamental aspects simultaneously,
A. human-level complex reasoning ability
and B. mastering of the world knowledge
, with one not doing without the other.
My research has been published in top-tier ML/NLP/DM venues, including, ICML, NeurIPS, ACL, CVPR, AAAI, WWW, SIGIR, IJCAI, EMNLP, ACM MM, TPAMI, TKDE, TOIS, TNNLS, TASLP, etc. My Ph.D thesis was awarded the Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS). I won more than ten honors and awards when I was in Ph.D stage. I’ve served as (Senior) Area Chair or Senior Program Committee of top-tier conferences, such as ACL, AAAI, IJCAI, EMNLP, NAACL, WSDM, COLING, ARR. I am a persistently-invited reviewer for prestigious journals including TPAMI, TNNLS, TKDE, TOIS, TAFFC and TASLP, etc. I am (was) the organization committee of WSDM 2022 (Volunteer Chair), NSSDM 2023 (Program Chair), EMNLP 2023 (Workshop Chair), SSNLP 2023 (Organizing Committee), ACL 2024 (Volunteer Chair). Also I am the Associate Editor of some journals, including TALLIP and Neurocomputing.
I am constantly looking for collaborations, especially on the topics mentioned above. Remote manner is also supported. For promising students I will provide sufficient GPUs. Hit me up, if you are a Ph.D/master/bachelor student and interested in what I am doing now. When you are from Chinese universities, there are also vacancies for research interns (e.g., self-/CSC-funded joint PhD project). Please describe your research status and attach your resume.
The tutorial video record of Multimodal LLM at ACM MM 2024 are released at Youtube; all slides and materials are available at homepage.
• 25 Oct 2024We will give a tutorial at ACM MM 2024 on Monday 28 Oct 9:00-12:30, on the hot topic of MLLMs: Architecture, Modality, Function, Instruction, Hallucination, Evaluation, Reasoning and Beyond. Please stay tuned to the program and welcome on-site or online attendance.
• 26 Sep 2024Eight papers are accepted by NeurIPS 2024, all about Multimodal LLMs and Learnings. Congrats to all my co-authors!
• 20 Sep 2024Three papers are accepted by EMNLP 2024 (Main/Findings), 1) Commonsense Reasoning, 2) Legal Text Generation, and 3) Survey on Conversational Understanding. Congrats to all my co-authors!
• 16 July 2024Four papers are accepted by ACM MM 2024, 1) Multimodal Conversational ABSA, 2) Speech Event Extraction, 3) Mutimodal Coreference Resolution and 4) Visual Programs. Congrats to all my co-authors!