Journal of KIISE

Search : [ keyword: Indexing ] (5)

Intel’s Optane DC Persistent Memory, a recently commercialized non-volatile byte-addressable memory, has an internal buffer of 256 bytes called XPLine, which processes memory access commands in units of cache lines or words. In this paper, we propose Opt Tree, a novel byte-addressable persistent index that utilizes the internal buffer of the Optane DCPM. Opt Tree divides the tree node into several small blocks of 256 bytes. For insertions and searches, Opt Tree accesses only two blocks. In our performance study, Opt Tree shows better insertion performance than the existing persistent indexes through its internal buffer-friendly design.

A Trie-based Indexing Scheme for Efficient Retrieval of Massive Spatio-Temporal IoT Sensor Data

Hawon Chu, Young-Kyoon Suh, Ryong Lee, Minwoo Park, Rae-Young Jang, Sang-Hwan Lee, Sa-Kwang Song

http://doi.org/10.5626/JOK.2020.47.12.1199

As the Internet-of-Things (IoT) sensors with enhanced communication technology and computing power have been widely utilized in many areas, a great deal of spatio-temporal data has been continuously generated. Thanks to the remarkable advances in storage technology, it is possible to collect such massive data into storage systems for further high-dimensional analysis. That said, it has been very challenging to speedily locate stored IoT data in a reasonable amount of time due to the heavy volume and complex spatial and temporal attributes. To address this concern, we propose a novel scalable indexing scheme, termed ST-Trie, to support the efficient querying of massive spatial-temporal data collected from IoT sensors. The key idea of our scheme is to encode three-dimensional spatiotemporal information into one-dimensional keys in consideration of time and space locality and then organize the keys into a logical trie structure. In our experiments with real datasets, the proposed scheme outperformed composite indexes by an average of up to 92 times in terms of query response time. In particular, we confirmed that ST-Trie scaled much better than the compared indexes with increasing time ranges.

An Efficient Distributed In-memory High-dimensional Indexing Scheme for Content-based Image Retrieval in Spark Environments

Dojin Choi, Songhee Park, Yeondong Kim, Jiwon Wee, Hyeonbyeong Lee, Jongtae Lim, Kyoungsoo Bok, Jaesoo Yoo

http://doi.org/10.5626/JOK.2020.47.1.95

Content-based image retrieval that searches an object in images has been utilizing for criminal activity monitoring and object tracking in video. In this paper, we propose a high-dimensional indexing scheme based on distributed in-memory for the content-based image retrieval. It provides similarity search by using massive feature vectors extracted from images or objects. In order to process a large amount of data, we utilized a big data platform called Spark. Moreover, we employed a master/slave model for efficient distributed query processing allocation. The master distributes data and queries. and the slaves index and process them. To solve k-NN query processing performance problems in the existing distributed high-dimension indexing schemes, we propose optimization methods for the k-NN query processing considering density and search costs. We conduct various performance evaluations to demonstrate the superiority of the proposed scheme.

Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI

Han-mook Yoo, Han-joon Kim, Jae-young Chang

http://doi.org/10.5626/JOK.2017.44.11.1236

In this paper, we propose a novel way of producing keyword networks, named LSI-based ClusterTextRank, which extracts significant key words from a set of clusters with a mutual information metric, and constructs an association network using latent semantic indexing (LSI). The proposed method reduces the dimension of documents through LSI, decomposes documents into multiple clusters through k-means clustering, and expresses the words within each cluster as a maximal spanning tree graph. The significant key words are identified by evaluating their mutual information within clusters. Then, the method calculates the similarities between the extracted key words using the term-concept matrix, and the results are represented as a keyword association network. To evaluate the performance of the proposed method, we used travel-related blog data and showed that the proposed method outperforms the existing TextRank algorithm by about 14% in terms of accuracy.

A Labeling Methods for Keyword Search over Large XML Documents

Dong-Han Sun, Soo-Chan Hwang

http://doi.org/

As XML documents are getting bigger and more complex, a keyword-based search method that does not require structural information is needed to search these large XML documents. In order to use this method, not only all keywords expressed as nodes in the XML document must be labeled for indexing but also structural information should be well represented. However, the existing labeling methods either have very simple information of XML documents for index or represent the structural information which is difficult to deal with the increase of XML documents" size. As the size of XML documents is getting larger, it causes either the poor performance of keyword search or the exponential increase of space usage. In this paper, we present the Repetitive Prime Labeling Scheme (RPLS) in order to improve the problem of the existing labeling methods for keyword-based search of large XML documents. This method is based on the existing prime number labeling method and allows a parent"s prime number to be used at a lower level repeatedly so that the number of prime numbers being generated can be reduced. Then, we show an experimental result of the comparison between our methods and the existing methods.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Journal of KIISE

Journal of KIISE

Digital Library[ Search Result ]

Opt Tree: Write Optimized Tree Using Optane DCPM Internal Buffer

A Trie-based Indexing Scheme for Efficient Retrieval of Massive Spatio-Temporal IoT Sensor Data

An Efficient Distributed In-memory High-dimensional Indexing Scheme for Content-based Image Retrieval in Spark Environments

Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI

A Labeling Methods for Keyword Search over Large XML Documents

Search

Editorial Office