Dimensions of context in referential, visually grounded language, and similarity relations that can be derived from it.

Abstract

Various routes for linking language to extralinguistic context have been explored in recent years. A lot of research has looked at modeling visual representations directly (Matuszek et al., 2012; Schlangen et al., 2016) or through mapping into a distributional or multi-modal distributional space. We investigate models of word meaning that link visual to lexical information, and explore several paths for combining them. We present a new model that learns individual perceptual predictors for words that link visual and distributional aspects of word meaning during training. We test the model in object naming tasks and show that this improves over previous models for zero-shot learning. At the same time, we explore different ways of learning continuous meaning representations from multi-modal contexts, through grounding in visual representations but also through expressions that refer to the smae object, and through expressions that refer to different objects in the same scene. We show that embeddings derived from these contexts capture complementary aspects of similarity, even if not outperforming textual embeddings trained on very large amounts of raw text when tested on standard similarity benchmarks.

Methods

Cross-modal mapping, zero-shot learning, learning word embeddings from non-linear context.

Publications

(Zarrieß and Schlangen 2017) (Zarrieß and Schlangen 2017) (Zarrieß and Schlangen 2017) (Zarrieß and Schlangen 2016)

  1. Deriving continous grounded meaning representations from referentially structured multimodal contexts Zarrieß, Sina, and Schlangen, David In Proceedings of EMNLP 2017 – Short Papers 2017 [PDF]
    BibTeX
    @inproceedings{Zarrieß-2017,
      author = {Zarrieß, Sina and Schlangen, David},
      booktitle = {Proceedings of EMNLP 2017 -- Short Papers},
      location = {Copenhagen},
      title = {{Deriving continous grounded meaning representations from referentially structured multimodal contexts}},
      year = {2017}
    }
    
    Details
  2. Obtaining referential word meanings from visual and distributional information: Experiments on object naming Zarrieß, Sina, and Schlangen, David In Proceedings of 55th annual meeting of the Association for Computational Linguistics (ACL) 2017 [PDF]
    BibTeX
    @inproceedings{Zarrieß-2017-1,
      author = {Zarrieß, Sina and Schlangen, David},
      booktitle = {Proceedings of 55th annual meeting of the Association for Computational Linguistics (ACL)},
      title = {{Obtaining referential word meanings from visual and distributional information: Experiments on object naming}},
      year = {2017}
    }
    
    Details
  3. Is this a Child, a Girl, or a Car? Exploring the Contribution of Distributional Similarity to Learning Referential Word Meanings Zarrieß, Sina, and Schlangen, David In Short Papers – Proceedings of the Annual Meeting of the European Chapter of the Association for Computational Linguistics (EACL) 2017 [PDF]
    BibTeX
    @inproceedings{Zarrieß-2017-2,
      author = {Zarrieß, Sina and Schlangen, David},
      booktitle = {Short Papers -- Proceedings of the Annual Meeting of the European Chapter of the Association for Computational Linguistics (EACL)},
      location = {Valencia, Spain},
      title = {{Is this a Child, a Girl, or a Car? Exploring the Contribution of Distributional Similarity to Learning Referential Word Meanings}},
      year = {2017}
    }
    
    Details
  4. Towards Generating Colour Terms for Referents in Photographs: Prefer the Expected or the Unexpected? Zarrieß, Sina, and Schlangen, David In Proceedings of the 9th International Natural Language Generation conference 2016 [Abs] [PDF]
    BibTeX
    @inproceedings{Zarrieß-2016-4,
      author = {Zarrieß, Sina and Schlangen, David},
      booktitle = {Proceedings of the 9th International Natural Language Generation conference},
      location = {Edinburgh, UK},
      pages = {246----255},
      publisher = {Association for Computational Linguistics},
      title = {{Towards Generating Colour Terms for Referents in Photographs: Prefer the Expected or the Unexpected?}},
      year = {2016}
    }
    
    Details

Code