New or Old? Exploring How Pre-Trained Language Models Represent Discourse Entities

LoƔiciga, Sharid and Beyer, Anne and Schlangen, David

Recent research shows that pre-trained language models, built to generate text conditioned on some context, learn to encode syntactic knowledge to a certain degree. This has motivated researchers to move beyond the sentence-level and look into their ability to encode less studied discourse-level phenomena. In this paper, we add to the body of probing research by investigating discourse entity representations in large pre-trained language models in English. Motivated by early theories of discourse and key pieces of previous work, we focus on the information-status of entities as discourse-new or discourse-old. We present two probing models, one based on binary classification and another one on sequence labeling. The results of our experiments show that pre-trained language models do encode information on whether an entity has been introduced before or not in the discourse. However, this information alone is not sufficient to find the entities in a discourse, opening up interesting questions about the definition of entities for future work.

In Proceedings of the 29th International Conference on Computational Linguistics , 2022
[PDF]
@inproceedings{Loaiciga-2022,
  title = {New or Old? Exploring How Pre-Trained Language Models Represent Discourse Entities},
  author = {Lo{\'a}iciga, Sharid and Beyer, Anne and Schlangen, David},
  booktitle = {Proceedings of the 29th International Conference on Computational Linguistics},
  month = oct,
  year = {2022},
  address = {Gyeongju, Republic of Korea},
  publisher = {International Committee on Computational Linguistics},
  url = {https://aclanthology.org/2022.coling-1.73},
  pages = {875--886}
}