Efficient indexing of hashtags using bitmap indices

Lawan Thamsuhang Subba, Christian Thomsen, Torben Bach Pedersen

Publikation: Bidrag til tidsskriftKonferenceartikel i tidsskriftForskningpeer review

1440 Downloads (Pure)

Abstract

The enormous amounts of data being generated regularly means that rapidly accessing relevant data from data stores is just as important as its storage. This study focuses on the use of a distributed bitmap indexing framework to accelerate query execution times in distributed data warehouses. Previous solutions for bitmap indexing at a distributed scale are rigid in their implementation, use a single compression algorithm, and provide their own mechanisms to store, distribute and retrieve the indices. Users are locked to their implementations even when other alternatives for compression and index storage are available or desirable. We provide an open source, lightweight, and flexible distributed bitmap indexing framework, where the mechanisms to search for keywords to index, the bitmap compression algorithm used, and the key-value store used for the indices are easily interchangeable. We demonstrate using Roaring bitmaps for compression, HBase for storing key-values, and adding an updated version of Apache Orc that uses bitmap indices to Apache Hive that although there is some runtime overhead due to index creation, the search of hashtags and their combinations in tweets can be greatly accelerated.
OriginalsprogEngelsk
TidsskriftCEUR Workshop Proceedings
Vol/bind2324
Antal sider10
ISSN1613-0073
StatusUdgivet - 20 mar. 2019
Begivenhed21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2019 - Lisbon, Portugal
Varighed: 26 mar. 2019 → …

Konference

Konference21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2019
Land/OmrådePortugal
ByLisbon
Periode26/03/2019 → …

Fingeraftryk

Dyk ned i forskningsemnerne om 'Efficient indexing of hashtags using bitmap indices'. Sammen danner de et unikt fingeraftryk.

Citationsformater