Digital Library[ Search Result ]
An Efficient Similarity Measure for Purchase Histories Considering Hierarchical Classification of Products
http://doi.org/10.5626/JOK.2020.47.10.999
In an online shopping mall or offline store, the products purchased by each customer over time form a purchase history of the customer. Also, in most cases, products have a hierarchical classification that represents their subcategories. In this paper, we propose a new similarity measure for purchase histories considering not only the purchase order of products but also the hierarchical classification of products. The proposed method extends the dynamic time warping similarity that is an existing representative similarity measure for sequences, to reflect the hierarchical classification of products. Unlike the existing method, where the similarity between the elements in two sequences is only 0 or 1 depending on whether the two elements are the same or not, the proposed method can assign any real number between 0 and 1 as the similarity between the two elements considering the hierarchical classification of elements. We also propose an efficient method for computing the proposed similarity measure. The proposed computation method uses the segment tree to evaluate the similarity between the two products in a hierarchical classification tree in an efficient manner. Through various experiments based on the real data, we show that the proposed method can measure the similarity between purchase histories of products with hierarchical classification in an exceedingly effective and efficient manner.
Semantic Similarity-based Intent Analysis using Pre-trained Transformer for Natural Language Understanding
Sangkeun Jung, Hyein Seo, Hyunji Kim, Taewook Hwang
http://doi.org/10.5626/JOK.2020.47.8.748
Natural language understanding (NLU) is a central technique applied to developing robot, smart messenger, and natural interface. In this study, we propose a novel similarity-based intent analysis method instead of the typical classification methods for intent analysis problems in the NLU. To accomplish this, the neural network-based text and semantic frame readers are introduced to learn semantic vectors using pairwise text-semantic frame instances. The text to vector and the semantic frame to vector projection methods using the pre-trained transformer are proposed. Then, we propose a method of attaching the intention tag of the nearest training sentence to the query sentence by measuring the semantic vector distances in the vector space. Four experiments on the natural language learning suggest that the proposed method demonstrates superior performance compared to the existing intention analysis techniques. These four experiments use natural language corpora in Korean and English. The two experiments in Korean are weather and navigation language corpora, and the two English-based experiments involve air travel information systems and voice platform language corpora.
Passage Re-ranking Method Based on Sentence Similarity Through Multitask Learning
Youngjin Jang, Hyeon-gu Lee, Jihyun Wang, Chunghee Lee, Harksoo Kim
http://doi.org/10.5626/JOK.2020.47.4.416
The machine reading comprehension(MRC) system is a question answering system in which a computer understands a given passage and respond questions. Recently, with the development of the deep neural network, research on the machine reading system has been actively conducted, and the open domain machine reading system that identifies the correct answer from the results of the information retrieval(IR) model rather than the given passage is in progress. However, if the IR model fails to identify a passage comprising the correct answer, the MRC system cannot respond to the question. That is, the performance of the open domain MRC system depends on the performance of the IR model. Thus, for an open domain MRC system to record high performance, a high performance IR model must be preceded. The previous IR model has been studied through query expansion and reranking. In this paper, we propose a re-ranking method using deep neural networks. The proposed model re-ranks the retrieval results (passages) through multi-task learning-based sentence similarity, and improves the performance by approximately 8% compared to the performance of the existing IR model with experimental results of 58,980 pairs of MRC data.
Recommending Similar Users Through Interaction Analysis in Social IoT Environments
Yeondong Kim, Dojin Choi, Jongtae Lim, Kyoungsoo Bok, Jaesoo Yoo
http://doi.org/10.5626/JOK.2020.47.1.61
Recently, there has been extensive research on the social internet of things(Social IoT) that combines social networks and internet of things. Social IoT is integral for the connection between as well as for establishing relationships between users and objects for sharing information between objects or users. In this paper, we propose a method that recommends similar users by considering interaction between objects and users in the social IoT environments. The similar users can be found by analyzing the behavior of the users around the object. The proposed method improves the accuracy of similarity by calculating similarity in determining interests based on documents written by users in social networks. Finally, it recommends Top-N users as similar users based on the two similarity values. To show the superiority of the proposed method, we conducted various performance evaluations.
An Efficient Distributed In-memory High-dimensional Indexing Scheme for Content-based Image Retrieval in Spark Environments
Dojin Choi, Songhee Park, Yeondong Kim, Jiwon Wee, Hyeonbyeong Lee, Jongtae Lim, Kyoungsoo Bok, Jaesoo Yoo
http://doi.org/10.5626/JOK.2020.47.1.95
Content-based image retrieval that searches an object in images has been utilizing for criminal activity monitoring and object tracking in video. In this paper, we propose a high-dimensional indexing scheme based on distributed in-memory for the content-based image retrieval. It provides similarity search by using massive feature vectors extracted from images or objects. In order to process a large amount of data, we utilized a big data platform called Spark. Moreover, we employed a master/slave model for efficient distributed query processing allocation. The master distributes data and queries. and the slaves index and process them. To solve k-NN query processing performance problems in the existing distributed high-dimension indexing schemes, we propose optimization methods for the k-NN query processing considering density and search costs. We conduct various performance evaluations to demonstrate the superiority of the proposed scheme.
Automatic Extraction of Sentence Embedding Features for Question Similarity Analysis in Dialogues
Kyo-Joong Oh, Dongkun Lee, Chae-Gyun Lim, Ho-Jin Choi
http://doi.org/10.5626/JOK.2019.46.9.909
This paper describes a method for the automatic extraction of feature vectors that can be used to analyze the similarity among natural language sentences. Similarity analysis among sentences is a necessary aspect of measuring semantic or structural similarity in natural language understanding. The analysis results can be used to find answers in Question and Answer (Q&A) systems and dialogue systems. The similarity analysis uses sentence vectors extracted by two deep learning models: the Recurrent Neural Network (RNN) to reflect sequential information of expressions such as syllables and semantic morphemes, and the Convolutional Neural Network (CNN) for characterizing the appearance patterns of similar expressions such as words or phrases. In this paper, we examine the accuracy and quality of the method using sentence vectors that are automatically extracted by the models from dialogues related to banking service. This method can find more similar questions and answers in FAQs than existing methods. The automatic feature extraction method can be used to analyze the similarity of Korean sentences across various application domains and systems.
An Approach to Detect Macros via Self-similarity of Mobile Input
http://doi.org/10.5626/JOK.2019.46.9.951
Macros that repeats specified in-game actions without the need for human interaction are a major cause of unfairness in computer gaming. For the success of a game service, the organizational use of macros which destroys the game’s economy and can deteriorate a user’s game motivation should be prohibited. It is particularly easy for macros to be generated and used in mobile games, because a mobile game’s design and playing sequence are likely to be relatively simple compared to those of PC games because of the limited hardware resources and, inefficient input methods of mobile devices compared to PCs. At the same time, the current macro detection methods used in mobile games can consume substantial amounts of resources. Thus, macro detection is still a challenge in mobile game services. In this paper, we propose a method to detect macros via self-similarity based on the mobile input. Our proposed method sets the unit for effectively obtaining self-similarity with fewer resources. We applied the proposed method to two mobile games and showed that macro and human activities can be distinguished with high accuracy.
Semi-automatic Expansion for a Chatting Corpus Based on a K-means Clustering Method And Similarity Measure
http://doi.org/10.5626/JOK.2019.46.5.440
In this paper, we proposed a semi-automatic expansion method to expand a chatting corpus using a large amount of utterance data from movie subtitles and drama scripts. To expand the chatting corpus, the proposed system used previously constructed chatting corpus and a similarity measure. If the similarity is calculated between a previously constructed chatting corpus and the input utterance was greater than a threshold value set in the experiment, the input utterance was selected as a new chatting utterance, that it is a correct chatting pair. We used morpheme-unit word embeddings and a Convolutional Neural Networks to efficiently calculate the similarity of the utterance embedding. In order to improve the speed of the semi-automatic expansion process, we proposed to reduce the amount of computation by clustering chat corpus by K-means clustering algorithm. Experimental results showed that the precision, recall, and F1 score of the proposed system were 61.28%, 53.19%, and 56.94%, respectively, which was 5.16%p, 6.09%, and 5.73%p higher than that of the baseline system. The term frequency and the speed of our system were also about a hundred times faster.
Sentence Similarity Prediction based on Siamese CNN-Bidirectional LSTM with Self-attention
Mintae Kim, Yeongtaek Oh, Wooju Kim
http://doi.org/10.5626/JOK.2019.46.3.241
A deep learning model for semantic similarity between sentences was presented. In general, most of the models for measuring similarity word use level or morpheme level embedding. However, the attempt to apply either word use or morpheme level embedding results in higher complexity of the model due to the large size of the dictionary. To solve this problem, a Siamese CNN-Bidirectional LSTM model that utilizes phonemes instead of words or morphemes and combines long short term memory (LSTM) with 1D convolution neural networks with various window lengths that bind phonemes is proposed. For evaluation, we compared our model with Manhattan LSTM (MaLSTM) which shows good performance in measuring similarity between similar questions in the Naver Q&A dataset (similar to Kaggle Quora Question Pair).
Efficient Similarity Search of Multi-Attribute Records using An Optimal Attribute Assignment
http://doi.org/10.5626/JOK.2019.46.2.193
In this paper, we investigate the problem of record similarity search in cases where the given records consist of multiple attribute data. Despite the fact that similarity measures exist, they only quantify the similarity between two data sets leaving out the similarities between attributes in the record sets. To address this problem, we propose a record similarity measure which considers similarities among attributes in records. We also develop a novel filtering technique to efficiently generate candidate records with respect to a record similarity threshold. In addition, we propose an efficient verification technique that verifies if a candidate is a true match. Through an experimental study, we show that the proposed techniques can be used to search similar records with high efficiency and precision.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr