Efficient indexing of hashtags using bitmap indices

Lawan Thamsuhang Subba, Christian Thomsen, Torben Bach Pedersen

Research output: Contribution to journalConference article in JournalResearchpeer-review

1462 Downloads (Pure)

Abstract

The enormous amounts of data being generated regularly means that rapidly accessing relevant data from data stores is just as important as its storage. This study focuses on the use of a distributed bitmap indexing framework to accelerate query execution times in distributed data warehouses. Previous solutions for bitmap indexing at a distributed scale are rigid in their implementation, use a single compression algorithm, and provide their own mechanisms to store, distribute and retrieve the indices. Users are locked to their implementations even when other alternatives for compression and index storage are available or desirable. We provide an open source, lightweight, and flexible distributed bitmap indexing framework, where the mechanisms to search for keywords to index, the bitmap compression algorithm used, and the key-value store used for the indices are easily interchangeable. We demonstrate using Roaring bitmaps for compression, HBase for storing key-values, and adding an updated version of Apache Orc that uses bitmap indices to Apache Hive that although there is some runtime overhead due to index creation, the search of hashtags and their combinations in tweets can be greatly accelerated.
Original languageEnglish
JournalCEUR Workshop Proceedings
Volume2324
Number of pages10
ISSN1613-0073
Publication statusPublished - 20 Mar 2019
Event21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2019 - Lisbon, Portugal
Duration: 26 Mar 2019 → …

Conference

Conference21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2019
Country/TerritoryPortugal
CityLisbon
Period26/03/2019 → …

Fingerprint

Dive into the research topics of 'Efficient indexing of hashtags using bitmap indices'. Together they form a unique fingerprint.

Cite this