Friðriksdóttir & Einarsson (2022), Bootstrapping Icelandic Knowledge Graph Data
Proceedings of the ESSLLI 2022 Student Session
Bootstrapping Icelandic Knowledge Graph Data
Steinunn Rut Friðriksdóttir & Hafsteinn Einarsson
A knowledge graph is a semantic network of named entities, e.g. people, objects and organizations, that can be used to uniquely identify mentions in text. In order to create such a graph, it is crucial to possess plenty of specifically annotated data that includes not only the entities themselves but the relations that hold between them. Traditionally, such data has only been available for high-resource languages. In this paper, we present our approach to bootstrap training data using machine translation and open relation extraction methods. We hypothesize that by automatically translating our data to English, we can perform relation extraction using SOTA language models before translating the entities back to the source language, significantly reducing startup costs when developing such models for a given language. Our results show that this approach has promise for lower-resource languages such as Icelandic. However, it is currently limited due to the quality of translation and open relation extraction models.