Search : [ keyword: learned index ] (2)

Beyond Traditional Search: SIMD-Optimized Correction for Learned Index

Yeojin Oh, Nakyeong Kim, Jongmoo Choi, Seehwan Yoo

http://doi.org/10.5626/JOK.2025.52.5.363

To address the limitations of traditional indexing techniques, this study examines the search performance of machine learning-based Learned Indexes, focusing on the read-only RMI and the modifiable ALEX We propose a SIMD-based optimization technique to minimize the overhead incurred during the correction phase, which accounts for over 80% of the total search time. Learned Indexes operate in two phases: prediction and correction. In our experiments with RMI, we found that when the error range is large, the SIMD Branchless Binary Search capable of quickly narrowing down the search range outperforms other methods. In contrast. when the error range is small, the model prediction-based SIMD Linear Search demonstrates superior performance. For ALEX, which maintains a relatively constant error range, the straightforward SIMD Linear Search proved to be the most efficient compared to more complex search techniques. These results underscore the importance of choosing the right search algorithm based on the dataset’s error range, index size, and density to achieve optimal performance.

Improving False Positive Rate of Extended Learned Bloom Filters Using Grid Search

Soohyun Yang, Hyungjoo Kim

http://doi.org/10.5626/JOK.2022.49.1.78

Bloom filter is a data structure that represents a set and returns whether data is included or not. However, there are cases in which false positives are returned at the cost of using less space. The learned bloom filter is a variation of the bloom filter, that uses a machine learning model in the pre-processing process to improve the false-positive rate. The learned bloom filter stores some data in the machine learning model, and the leftover data is stored in the auxiliary filter. An auxiliary filter can be implemented by using a bloom filter only, but in this paper, we use the bloom filter and the learned hash function, and this is called an extended learned bloom filter. The learned hash function uses the output value of the machine learning model as a hash function. In this paper, we propose a method that improves the false positive rate of the extended learned bloom filter through grid search. This method explores the extended learned bloom filter with the lowest false positive rate, by increasing the hyperparameter that represents the ratio of the learned hash function. As a result, we experimentally show that the extended learned bloom filter selected through grid search, can have a 20% improvement in false-positive rate compared to the learned bloom filter, in the experiment that needs more than 100,000 data to store. In addition, we also show that the false negative error may occur in the learned hash function by the use of 32-bit floating points in the neural network model. This can be solved by changing the floating points to 64-bit. Finally, we show that in an experiment where we query 10,000 data, we can adjust the structure of the neural network model to save 20KB of space and create an extended learned bloom filter with the same false-positive rate. However, the query time is increased by 2% at the cost of saving 20KB of space.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr