Temporally relevant parallel top-k spatial keyword search
Keywords:spatial keyword search, spatio-textual, I/O-efficient indexing, top-k, temporal relevance
New spatio-textual indexing methods are needed to support efficient search and update of the massive amounts of spatially referenced text being generated. Location based services using geo-tagged documents provide valuable ranked recommendations about nearby restaurants, services, sales, emergency events, and visitor attractions. Consequently, top-k spatial keyword search queries (TkSKQ) have received a lot of attention from the research community. Several spatio-textual indexes have been proposed to efficiently support TkSKQ. Some of these indexes support updates based on live document streams, but the ranking schemes employed by them do not simultaneously incorporate temporal relevance, textual similarity and spatial proximity. Moreover, existing approaches have limited or no capability to exploit parallelism with document ingestion and query execution. We present a parallel spatio-textual index, Pastri, to address the aforementioned issues. Pastri can be updated incrementally over real-time spatio-textual document streams. To support temporally relevant ranking of continuously generated document streams, we propose a dynamic ranking scheme. Our approach retrieves the top-k documents that are most temporally relevant at the time of a query execution. We implemented Pastri and we integrate it within a system with a persistent document store and several thread pools to exploit parallelism at various levels. Experimental evaluation involving real-world datasets and synthetic datasets (that we created) demonstrates that our system is able to sustain high document update throughput. Furthermore, Pastri's TkSKQ search performance is one to two orders of magnitude faster than other spatio-textual indexes.