The enormous amounts of data being generated regularly means that rapidly accessing relevant data from data stores is just as important as its storage. This study focuses on the use of a distributed bitmap indexing framework to accelerate query execution times in distributed data warehouses. Previous solutions for bitmap indexing at a distributed scale are rigid in their implementation, use a single compression algorithm, and provide their own mechanisms to store, distribute and retrieve the indices. Users are locked to their implementations even when other alternatives for compression and index storage are available or desirable. We provide an open source, lightweight, and flexible distributed bitmap indexing framework, where the mechanisms to search for keywords to index, the bitmap compression algorithm used, and the key-value store used for the indices are easily interchangeable. We demonstrate using Roaring bitmaps for compression, HBase for storing key-values, and adding an updated version of Apache Orc that uses bitmap indices to Apache Hive that although there is some runtime overhead due to index creation, the search of hashtags and their combinations in tweets can be greatly accelerated.
|Tidsskrift||CEUR Workshop Proceedings|
|Status||Udgivet - 20 mar. 2019|
|Begivenhed||21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2019 - Lisbon, Portugal|
Varighed: 26 mar. 2019 → …
|Konference||21st International Workshop on Design, Optimization, Languages and Analytical Processing of Big Data, DOLAP 2019|
|Periode||26/03/2019 → …|