Predicting the geolocation of tweets using transformer models on customized data

Authors

  • Kateryna Lutsai National Technical University of Ukraine Igor Sikorsky Kyiv Polytechnic Institute
  • Christoph H. Lampert Institute of Science and Technology Austria https://orcid.org/0000-0001-8622-7887

DOI:

https://doi.org/10.5311/JOSIS.2024.29.295

Keywords:

geolocation prediction, transformers, machine learning, multitask learning, regression task, Gaussian mixture model, Twitter dataset

Abstract

This research is aimed to solve the tweet/user geolocation prediction task and provide a flexible methodology for the geo-tagging of textual big data. The suggested approach implements neural networks for natural language processing (NLP) to estimate the location as coordinate pairs (longitude, latitude) and two-dimensional Gaussian Mixture Models (GMMs). The scope of proposed models has been finetuned on a Twitter dataset using pretrained Bidirectional Encoder Representations from Transformers (BERT) as base models. Performance metrics show a median error of fewer than 30 km on a worldwide-level, and fewer than 15 km on the US-level datasets for the models trained and evaluated on text features of tweets' content and metadata context. Our source code and data are available at https://github.com/K4TEL/geo-twitter.git.

Author Biography

Christoph H. Lampert, Institute of Science and Technology Austria

since  04/2015    Professor, Institute of Science and Techology Austria, Klosterneuburg

295

Downloads

Published

2024-12-26

Issue

Section

Research Articles