Friðriksdóttir & Einarsson (2022), Bootstrapping Icelandic Knowledge Graph Data.pdf (392.58 kB)
Download file

Friðriksdóttir & Einarsson (2022), Bootstrapping Icelandic Knowledge Graph Data

Download (392.58 kB)
conference contribution
posted on 25.07.2022, 11:45 authored by Steinunn Rut Friðriksdóttir, Hafsteinn Einarsson

Proceedings of the ESSLLI 2022 Student Session

Bootstrapping Icelandic Knowledge Graph Data

Steinunn Rut Friðriksdóttir & Hafsteinn Einarsson

A knowledge graph is a semantic network of named entities, e.g. people, objects and organizations, that can be used to uniquely identify mentions in text. In order to create such a graph, it is crucial to possess plenty of specifically annotated data that includes not only the entities themselves but the relations that hold between them. Traditionally, such data has only been available for high-resource languages. In this paper, we present our approach to bootstrap training data using machine translation and open relation extraction methods. We hypothesize that by automatically translating our data to English, we can perform relation extraction using SOTA language models before translating the entities back to the source language, significantly reducing startup costs when developing such models for a given language. Our results show that this approach has promise for lower-resource languages such as Icelandic. However, it is currently limited due to the quality of translation and open relation extraction models. 

History

Retention period

01/07/2099