The semantics of the world can be essentially organized in structured formats, and different data of modalities comes with structural representations. For example, the understanding of almost all the NLP applications can be seen as a hierarchy with different levels. For the understanding of other modal information (e.g., visions), the key also lies in the comprehension of semantic structure, such as with the scene graph representation. The following Figure exemplifies the linguistic syntax structures in NLP of dependency tree & constituency grammar and also the visual scene graph structure in CV. Besides, the world knowledge has been represented in structured formats, i.e., knowledge graph. Also human-level reasoning largely follows structural manner.
Thus, the essence of semantics understanding of languages, visions, etc., lies in the understanding of the intrinsic semantic structures, which motivates my research angle of Structure-aware Intelligence Learning
(SAIL).
With the idea of SAIL, I divide my research into three key branches: structure-aware NLP, structure-aware MM and structure-aware LLM.
Starting with deep learning-based semantics understanding in NLP area, I engaged in the exploration of structure-aware NLP.
Later I have extended the SAIL idea to structure-aware MM.
The recent rise and great triumph of LLM have revealed the great potential of leading AGI via this path.
Correspondingly, latest I proactively integrate the idea of structural awareness into the LLM for semantics understanding, i.e., structure-aware LLM.
And the ultimate goal is thus to realize human-level AGI for universal modalities by modeling the semantic structures of the world.
To achieve the AGI goal via SAIL that aligns the most with human society, these targets also should and will be achieved, including efficacy, interpretability, robustness (generalizability), efficiency (scalability) and trustworthiness.
In the following Figure, I summarize and illustrate the big picture of my research goal.
My research scope covers the Natural Language Processing (NLP) and the intersection of NLP and Computer Vision (CV), i.e., Vision-Language Learning or Multimodal Machine Learning. Starting with deep learning based semantics understanding, where I engage in structure-aware NLP and structure-aware MM, I proactively integrate the SAIL idea into the language model (LM) for semantics understanding, i.e., structure-aware LM. The recent rise and great triumph of LLM have reveal the great potential of leading to AGI via this path. And the ultimate goal is thus to realize human-level AGI for universal modalities by modeling the semantic structures of the world. To achieve the AGI goal via SAIL that aligns the most with human society, these targets also should and will be achieved, including efficacy, interpretability, robustness (generalizability), efficiency (scalability) and trustworthiness.
My research is sliced into the following blocks with selected publications [View complete publications]: