Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft

Kranti, Chalamalasetti and Hakimov, Sherzod and Schlangen, David

In the Minecraft Collaborative Building Task, two players collaborate: an Architect (A) provides instructions to a Builder (B) to assemble a specified structure using 3D blocks. In this work, we investigate the use of large language models (LLMs) to predict the sequence of actions taken by the Builder. Leveraging LLMs’ in-context learning abilities, we use few-shot prompting techniques, that significantly improve performance over baseline methods. Additionally, we present a detailed analysis of the gaps in performance for future work.

In Findings of the Association for Computational Linguistics: EMNLP 2024 , 2024
[PDF]
@inproceedings{Chalamalasetti-2024,
  title = {Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on {M}inecraft},
  author = {Kranti, Chalamalasetti and Hakimov, Sherzod and Schlangen, David},
  editor = {Al-Onaizan, Yaser and Bansal, Mohit and Chen, Yun-Nung},
  booktitle = {Findings of the Association for Computational Linguistics: EMNLP 2024},
  month = nov,
  year = {2024},
  address = {Miami, Florida, USA},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2024.findings-emnlp.652/},
  doi = {10.18653/v1/2024.findings-emnlp.652},
  pages = {11159--11170}
}