XSRL: Cross-Lingual Semantic Role Labeling with High-Quality Translated Training Corpus

Abstract

Many efforts of research are devoted to semantic role labeling (SRL) which is crucial for natural language understanding. Supervised approaches have achieved impressing performances when large-scale corpora are available for resource-rich languages such as English. While for the low-resource languages with no annotated SRL dataset, it is still challenging to obtain competitive performances. Cross-lingual SRL is one promising way to address the problem, which has achieved great advances with the help of model transferring and annotation projection. In this paper, we propose a novel alternative based on corpus translation, constructing high-quality training datasets for the target languages from the source gold-standard SRL annotations. Experimental results on Universal Proposition Bank show that the translation-based method is highly effective, and the automatic pseudo datasets can improve the target-language SRL performances significantly.

Presentation

Motivation

Model transferring and annotation projection are two mainstream categories for the cross-lingual transfer learning. The former builds cross-lingual models on language-independent features such as cross-lingual word representations and universal POS tags which can be transferred into target languages directly. The latter bases on a large-scale parallel corpus between the source and target languages where the source-side sentences are annotated with SRL tags automatically by a source SRL labeler, and then the source annotations are projected onto the target-side sentences in accordance of word alignments.

The annotation projection can be combined with model transferring naturally, where the projected SRL tags in annotation projection could contain much noise because of the source-side automatic annotations. A straightforward solution is the translation-based approach, which has been demonstrated effective for cross-lingual dependency parsing. The key idea is to translate the gold-standard source training data into target language side by translation directly, avoiding the problem of the low-quality source annotations.

In this project, we study the translation-based method for cross-lingual SRL. Sentences of the source language training corpus are translated into the target language, and then the source SRL annotations are projected into the target side, resulting in a set of high-quality target language SRL corpus, which is used to train the target SRL model. Further, we merge the gold-standard source corpus and the translated target together, which can be regarded as a combination of the translation-based method and the model transferring.

SRL Translation

Step 1. Translating: produce target translations for the sentences of the source SRL data.

Step 2. Projecting: incrementally project the corresponding predicates or arguments of a source sentence to its target.

SRL Model

Model architecture:

PGN-BiLSTM: For a better exploration of the blended corpus, we adopt a parameter generation network (PGN) to enhance the BiLSTM module, which can capture the language differences effectively

Experiment

Cross-Lingual Transfer from English to other languages.

Multi-Source Transfer.

Bilingual Transfer.

Performances by the SRL roles.

Performances by the distances to the predicate.

Paper

BibTeX

@inproceedings{FeiAcl23THOR,
  author    = {Hao Fei, Bobo Li, Qian Liu, Lidong Bing, Fei Li, Tat-Seng Chua},
  title     = {Reasoning Implicit Sentiment with Chain-of-Thought Prompting},
  journal   = {Proceedings of the Annual Meeting of the Association for Computational Linguistics},
  year      = {2023},
}