Predicting the geolocation of tweets using transformer models on customized data
DOI:
https://doi.org/10.5311/JOSIS.2024.29.295Keywords:
geolocation prediction, transformers, machine learning, multitask learning, regression task, Gaussian mixture model, Twitter datasetAbstract
This research is aimed to solve the tweet/user geolocation prediction task and provide a flexible methodology for the geo-tagging of textual big data. The suggested approach implements neural networks for natural language processing (NLP) to estimate the location as coordinate pairs (longitude, latitude) and two-dimensional Gaussian Mixture Models (GMMs). The scope of proposed models has been finetuned on a Twitter dataset using pretrained Bidirectional Encoder Representations from Transformers (BERT) as base models. Performance metrics show a median error of fewer than 30 km on a worldwide-level, and fewer than 15 km on the US-level datasets for the models trained and evaluated on text features of tweets' content and metadata context. Our source code and data are available at https://github.com/K4TEL/geo-twitter.git.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 Kateryna Lutsai, Christoph H. Lampert
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
Articles in JOSIS are licensed under a Creative Commons Attribution 3.0 License.