|
I am a senior postdoctoral researcher at the University of Oxford, jointly working with Prof. Yarin Gal in the OATML Group and Prof. Chris Holmes in the Big Data Institute. Previously, I was a senior research fellow at National University of Singapore, where I worked with Prof. Mong-Li Lee, Prof. Wynne Hsu, Prof. Tat-Seng Chua and Prof. Shuicheng Yan. I also worked as a visiting researcher at Microsoft Research Asia, an associate researcher at Skywork AI Singapore, and SEA AI lab, respectively. I graduated as Ph.D from Wuhan University.
My research has been published in top-tier ML/NLP/CV/MM venues, e.g., ICML, NeurIPS, ACL, CVPR, AAAI, WWW, SIGIR, IJCAI, EMNLP, ACM-MM, TPAMI, TKDE, TOIS, TNNLS, TASLP. I was awarded the World AI Conference Rising Star in 2023. My papers were selected as Most Influential Papers by Paper Digest, and ESI Highly Influential Papers and 2024 WAIC Outstanding Paper Award. I was also the recipient of the 2023 WAIC Rising Star award, and ranked as Top 2% Scientists Worldwide 2024 (Single Year) by Stanford University. I’ve regularly served as (Senior) Area Chair or Senior Program Committee of top-tier conferences. I was the organization committee of WSDM 2022, EMNLP 2023, ACL 2024, ACM MM 2025. I serve as the Associate Editor of some journals, including TALLIP and Neurocomputing. And I am a persistently-invited reviewer for many journals including TPAMI, IJCV, TNNLS, TKDE, TOIS, etc. My Ph.D thesis was awarded the Excellent Doctoral Thesis of Chinese Information Processing Society (CIPS). I won more than ten honors and awards during Ph.D stage.
My research interests lie in NLP, CV, and the intersection of both (i.e., Multimodal/Vision-Language Learning). My long-term goal is to achieve human-level AI centered around multimodal LLMs & generalists. While previously I worked a lot on the topic of Structural Modeling of Language&Vision, I pay the most recent focus on the unified multimodal generalist towards human-level capacity (Modality, Task, Knowledge) and cognition (Reasoning, Affection), with following key topics and representative works (detailed in research statement):
▶ Multimodal Foundation Models: Unified multimodal LLMs and generalists.
▶ Capacity: Comprehension/generation of modalities/tasks, knowledge acquisition.
▶ Cognition: Cross-modal neuro-symbolic reasoning, human-centric affective computing.
I also extensively explore the AI for science, including 1) clinical psychology & social studies, 2) bio-/medicine & healthcare, and 3) material science, by integrating the advanced LLM/agent methodologies.
I am constantly looking for collaborations on the above topics. Remote manner is also supported. For promising students I will provide sufficient GPUs. Hit me up, if you are a Ph.D/master/bachelor student and interested in what I am doing now (with potential vacancies for research interns/RAs/visiting). For students from University of Oxford, I’m particularly looking for collaborations on world modeling + AI scientist. Please describe your research status and attach your resume & statement.
Five papers are accepted by ICLR 2026, 1) JavisDiT++, 2) JavisDiT, 3) LogicReward 4) Interleaved Reasoning and 5) Cognitive Emotion Reasoning. Congrats to all my co-authors!
• 14 Nov 2025We are thrilled to release UniVA: Universal Video Agent — an open-source next-generation video generalist! UniVA features: 1) 🤖 Unified Agentic System: an one-stop, omnipotent, highly automated, interactive, interactive and proactive video creation station, with deep memory and planner–executor synergy. 2) 🎬 Powerful Creation: MCP-native modular system covering understanding, editing, tracking, and any-conditioned video generation with industrial-grade cinematic quality. Try the online demo now! Check the paper.
• 8 Nov 2025Two papers are accepted by AAAI 2025, 1) 4D Generation and 2) DragNeXt. Congrats to all my co-authors!
• 25 Sep 2025Four papers are accepted by NeurIPS 2025, 1) JavisGPT, 2) MuSLR, 3) VimoRAG and 4) Visual Thoughts. Congrats to all my co-authors!
• 21 Aug 2025Three papers are accepted by EMNLP 2025, 1) 3D Emotional Facial Generation, 2) Financial QA, and 3) Legal LLM. Congrats to all my co-authors!
• 16 July 2025Four papers are accepted by ACM MM 2025, 1) SSM for Salient Object Detection, 2) FormFactory, 3) ViTCoT and 4) MCM-DPO. Congrats to all my co-authors!
• 26 June 2025Four papers are accepted by ICCV 2025, 1) PhysSplat, 2) Explainable Driving, 3) Derm1M: Clinical Ontology Knowledge and 4) Iris: Self-Refining for GUI Agent. Congrats to all my co-authors!