Sina Zarrieß, David Schlangen

Incremental "real-world" REG with mouse movements as non-verbal feedback.

#### Abstract

Research on generating referring expressions (REG) has so far mostly focussed on “one-shot reference”, where the aim is to generate a single, written, discriminating expression. In interactive settings, however, it is not uncommon for reference to be established in “installments”, where referring information is offered piecewise until success has been confirmed (Clark and Wilkes-Gibbs, 1986; Clark and Krych, 2004). Practically, interactive REG has been rarely studied empirically or implemented in realistic systems. We show that a collaborative, interactive approach to REG can be advantageous in technical systems that only have uncertain access to object attributes and categories. We train a recently introduced model of grounded word meaning on a data set of REs for objects in real-world images and learn to predict semantically appropriate expressions. In a human evaluation, we observe that users are sensitive to inadequate object names - which unfortunately are not unlikely to be generated from low-level visual input. We propose a solution inspired from human task-oriented interaction and implement strategies for avoiding and repairing semantically inaccurate words. We enhance a word-based REG with context-aware, referential installments and find that they substantially improve the referential success of the system.

#### Methods

This work is based on the words-as-classifiers model that grounds word meanings in visual instances of real-world referents. wac We set up an interactive, incremental decoding procedure that reacts to the listerners non-verbal actions.

#### Publications

1. Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images In Proceedings of INLG 2017 (demo papers) 2017 [PDF]
BibTeX
@inproceedings{Zarrieß-2017-3,
author = {Zarrieß, Sina and Schlangen, David},
booktitle = {Proceedings of INLG 2017 (demo papers)},
title = {{Refer-iTTS: A System for Referring in Spoken Installments to Objects in Real-World Images}},
year = {2017}
}

Details
2. Easy Things First: Installments Improve Referring Expression Generation for Objects in Photographs In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016) 2016 [PDF]
BibTeX
@inproceedings{Zarrieß-2016,
author = {Zarrieß, Sina and Schlangen, David},
booktitle = {Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016)},
location = {Berlin, Germany},
title = {{Easy Things First: Installments Improve Referring Expression Generation for Objects in Photographs}},
year = {2016}
}

Details