Search : [ keyword: data structure ] (6)

Improving False Positive Rate of Extended Learned Bloom Filters Using Grid Search

Soohyun Yang, Hyungjoo Kim

http://doi.org/10.5626/JOK.2022.49.1.78

Bloom filter is a data structure that represents a set and returns whether data is included or not. However, there are cases in which false positives are returned at the cost of using less space. The learned bloom filter is a variation of the bloom filter, that uses a machine learning model in the pre-processing process to improve the false-positive rate. The learned bloom filter stores some data in the machine learning model, and the leftover data is stored in the auxiliary filter. An auxiliary filter can be implemented by using a bloom filter only, but in this paper, we use the bloom filter and the learned hash function, and this is called an extended learned bloom filter. The learned hash function uses the output value of the machine learning model as a hash function. In this paper, we propose a method that improves the false positive rate of the extended learned bloom filter through grid search. This method explores the extended learned bloom filter with the lowest false positive rate, by increasing the hyperparameter that represents the ratio of the learned hash function. As a result, we experimentally show that the extended learned bloom filter selected through grid search, can have a 20% improvement in false-positive rate compared to the learned bloom filter, in the experiment that needs more than 100,000 data to store. In addition, we also show that the false negative error may occur in the learned hash function by the use of 32-bit floating points in the neural network model. This can be solved by changing the floating points to 64-bit. Finally, we show that in an experiment where we query 10,000 data, we can adjust the structure of the neural network model to save 20KB of space and create an extended learned bloom filter with the same false-positive rate. However, the query time is increased by 2% at the cost of saving 20KB of space.

Design of Durable Node Replication for Persistent Memory Data Structures on NUMA Architectures

Junghan Kim, Young Ik Eom

http://doi.org/10.5626/JOK.2022.49.1.8

Recently, advances in persistent memory and NUMA technologies have allowed for the provision of high performance and large storage space to the applications such as big data and machine learning. Such PM environments on multi-node systems require a change in the data structures, which are being used in each layer of the software stack. In terms of the research on PM data structures, however, it is a difficult problem to ensure high level of concurrency as well as non-volatility which is an important characteristics of NUMA and PM, respectively. In this paper, we propose an NRPM that extends the node replication, which is a representative of NUMA algorithms. NRPM outperforms hash algorithm by up to 5x by improving concurrency in the multi-node PM server using shared-log and flat combining methods. We confirmed the validity of NRPM through various performance analyses considering the characteristics of NUMA-PM.

Instagram User Embedding and Fashion Photo Recommendation Using "likes" of Fashion Photos

Jaeyoung Lee, Younghoon Kim

http://doi.org/10.5626/JOK.2021.48.11.1235

As individual preference of fashion styles diversifies, demands for research recommending personalized fashion are increasing. Recently, with the development of deep learning technology, many studies have been conducted using deep learning to extract features from fashion photos and use them for recommendations. In this work, we exploit social network data to consider users and fashion styles in recommending fashion photos. Since social network users tend to post fashion photos in their preferred style and tag them with “Like“, social network data are very important for understanding relationship between users and fashion photos. We propose a technique to map users and fashion photos into the same vector space using social network data structure which consists of users and fashion photos. Especially, it is possible to use our method to recommend fashion photos that a user might prefer by mapping users and fashion photos not used for learning into a vector space without additional learning.

Opt Tree: Write Optimized Tree Using Optane DCPM Internal Buffer

Jonghyeon Yoo, Beomseok Nam

http://doi.org/10.5626/JOK.2021.48.7.742

Intel’s Optane DC Persistent Memory, a recently commercialized non-volatile byte-addressable memory, has an internal buffer of 256 bytes called XPLine, which processes memory access commands in units of cache lines or words. In this paper, we propose Opt Tree, a novel byte-addressable persistent index that utilizes the internal buffer of the Optane DCPM. Opt Tree divides the tree node into several small blocks of 256 bytes. For insertions and searches, Opt Tree accesses only two blocks. In our performance study, Opt Tree shows better insertion performance than the existing persistent indexes through its internal buffer-friendly design.

Space Efficient Top-k Query Encoding Based on Data Distribution

Wooyoung Park, Srinivasa Rao Satti

http://doi.org/10.5626/JOK.2020.47.3.235

We consider an encoding that supports a range top-k query on a two-dimensional array without accessing the original array. We propose a more space-efficient encoding method for top-k query with better average-case query time. Our experiments also show that our encoding is more space-efficient than the earlier ones. Also, based on the learning-based data structure, we propose the use of the learning-based data structure on succinct data structures.

Efficient Authentication of Aggregation Queries for Outsourced Databases

Jongmin Shin, Kyuseok Shim

http://doi.org/10.5626/JOK.2017.44.7.703

Outsourcing databases is to offload storage and computationally intensive tasks to the third party server. Therefore, data owners can manage big data, and handle queries from clients, without building a costly infrastructure. However, because of the insecurity of network systems, the third-party server may be untrusted, thus the query results from the server may be tampered with. This problem has motivated significant research efforts on authenticating various queries such as range query, kNN query, function query, etc. Although aggregation queries play a key role in analyzing big data, authenticating aggregation queries has not been extensively studied, and the previous works are not efficient for data with high dimension or a large number of distinct values. In this paper, we propose the AMR-tree that is a data structure, applied to authenticate aggregation queries. We also propose an efficient proof construction method and a verification method with the AMR-tree. Furthermore, we validate the performance of the proposed algorithm by conducting various experiments through changing parameters such as the number of distinct values, the number of records, and the dimension of data.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr