Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0

Date

Authors

De Toni, Francesco
Akiki, Christopher
de la Rosa, Javier
Fourrier, Clémentine
Manjavacas, Enrique
Schweter, Stefan
van Strien, Daniel

Journal Title

Journal ISSN

Volume Title

Publisher

Association for Computational Linguistics (ACL)

Access Statement

Research Projects

Organizational Units

Journal Issue

Abstract

In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.

Description

Keywords

Citation

Source

Book Title

Challenges & Perspectives in Creating Large Language Models: Proceedings of BigScience Episode #5 Workshop

Entity type

Publication

Access Statement

License Rights

Restricted until