Journal of KIISE

Search : [ keyword: 데이터 마이닝 ] (7)

An Effective Detection Method of Anomalous Sequences Considering the Occurrence Order and Time Interval of the Elements

http://doi.org/10.5626/JOK.2021.48.4.469

Recently, a rapid generation of sequence data consisting of elements in various applications has been witnessed over time. Although various methods for detecting anomalous sequences among the given sequences have been actively studied, most of them mainly consider only the occurrence order of the elements. In this paper, we propose an effective anomalous sequence detection method considering not only the occurrence order of the elements but also the time interval between the elements. Apparently, the proposed method uses a model that combines two autoencoders. The first is an LSTM autoencoder, which learns the features of the occurrence order of elements, and the second is a graph autoencoder, which learns the features of the time interval between the elements. After completion of the training, each sequence is input to the trained model and reconstructed by the trained model. If the occurrence order and time interval of elements in the reconstructed sequence greatly differ from those in the original sequence, the corresponding sequence is determined as an anomalous sequence. Through various experiments using synthetic data, we confirmed that the proposed method can detect anomalous sequences more effectively than the method that uses an RNN autoencoder to learn the occurrence order of the elements, the methods that use a single LSTM autoencoder and the method that doesn’t use deep learning model.

A Comparative Study of Machine Learning Algorithms for Diagnosis of Ischemic Heart Disease

Pyoung-Woo Park, Min-Koo Kim, Hong-Seok Lim, Duk-Yong Yoon, Seok-Won Lee

http://doi.org/10.5626/JOK.2018.45.4.376

In recent years, studies on artificial intelligence have been actively conducted, and artificial intelligence technology supports accurate and efficient decision-making for mankind. Also, the accumulation of medical knowledge and related data is accelerating, and studies on diagnosis of diseases through artificial intelligence technology are being carried out briskly. In this study, I chose a representative cardiovascular disease, specifically ischemic heart disease, as a research domain, and analyzed the available algorithms comparing effective approaches in the medical expert system for diagnosis of the disease. Concretely, the purpose of the study is to assist medical experts and physicians based on the initial patient record data, help them to explain the cause of ischemic heart disease, and minimize unnecessary related tests. In addition, the experimental data can be configured so that medical professionals can use them as learning models, thereby maximizing their experience and knowledge efficiently.

Inverse Document Frequency-Based Word Embedding of Unseen Words for Question Answering Systems

Wooin Lee, Gwangho Song, Kyuseok Shim

http://doi.org/

Question answering system (QA system) is a system that finds an actual answer to the question posed by a user, whereas a typical search engine would only find the links to the relevant documents. Recent works related to the open domain QA systems are receiving much attention in the fields of natural language processing, artificial intelligence, and data mining. However, the prior works on QA systems simply replace all words that are not in the training data with a single token, even though such unseen words are likely to play crucial roles in differentiating the candidate answers from the actual answers. In this paper, we propose a method to compute vectors of such unseen words by taking into account the context in which the words have occurred. Next, we also propose a model which utilizes inverse document frequencies (IDF) to efficiently process unseen words by expanding the system’s vocabulary. Finally, we validate that the proposed method and model improve the performance of a QA system through experiments.

A Context Recognition System for Various Food Intake using Mobile and Wearable Sensor Data

Kee-Hoon Kim, Sung-Bae Cho

http://doi.org/

Development of various sensors attached to mobile and wearable devices has led to increasing recognition of current context-based service to the user. In this study, we proposed a probabilistic model for recognizing user"s food intake context, which can occur in a great variety of contexts. The model uses low-level sensor data from mobile and wrist-wearable devices that can be widely available in daily life. To cope with innate complexity and fuzziness in high-level activities like food intake, a context model represents the relevant contexts systematically based on 4 components of activity theory and 5 W’s, and tree-structured Bayesian network recognizes the probabilistic state. To verify the proposed method, we collected 383 minutes of data from 4 people in a week and found that the proposed method outperforms the conventional machine learning methods in accuracy (93.21%). Also, we conducted a scenario-based test and investigated the effect contribution of individual components for recognition.

An Efficient Large Graph Clustering Technique based on Min-Hash

Seok-Joo Lee, Jun-Ki Min

http://doi.org/

Graph clustering is widely used to analyze a graph and identify the properties of a graph by generating clusters consisting of similar vertices. Recently, large graph data is generated in diverse applications such as Social Network Services (SNS), the World Wide Web (WWW), and telephone networks. Therefore, the importance of graph clustering algorithms that process large graph data efficiently becomes increased. In this paper, we propose an effective clustering algorithm which generates clusters for large graph data efficiently. Our proposed algorithm effectively estimates similarities between clusters in graph data using Min-Hash and constructs clusters according to the computed similarities. In our experiment with real-world data sets, we demonstrate the efficiency of our proposed algorithm by comparing with existing algorithms.

Secure Multiparty Computation of Principal Component Analysis

Sang-Pil Kim, Sanghun Lee, Myeong-Seon Gil, Yang-Sae Moon, Hee-Sun Won

http://doi.org/

In recent years, many research efforts have been made on privacy-preserving data mining (PPDM) in data of large volume. In this paper, we propose a PPDM solution based on principal component analysis (PCA), which can be widely used in computing correlation among sensitive data sets. The general method of computing PCA is to collect all the data spread in multiple nodes into a single node before starting the PCA computation; however, this approach discloses sensitive data of individual nodes, involves a large amount of computation, and incurs large communication overheads. To solve the problem, in this paper, we present an efficient method that securely computes PCA without the need to collect all the data. The proposed method shares only limited information among individual nodes, but obtains the same result as that of the original PCA. In addition, we present a dimensionality reduction technique for the proposed method and use it to improve the performance of secure similar document detection. Finally, through various experiments, we show that the proposed method effectively and efficiently works in a large amount of multi-dimensional data.

Secure Multi-Party Computation of Correlation Coefficients

Sun Kyong Hong, Sang Pil Kim, Hyo Sang Lim, Yang Sae Moon

http://doi.org/

In this paper, we address the problem of computing Pearson correlation coefficients and Spearman’s rank correlation coefficients in a secure manner while data providers preserve privacy of their own data in distributed environment. For a data mining or data analysis in the distributed environment, data providers(data owners) need to share their original data with each other. However, the original data may often contain very sensitive information, and thus, data providers do not prefer to disclose their original data for preserving privacy. In this paper, we formally define the secure correlation computation, SCC in short, as the problem of computing correlation coefficients in the distributed computing environment while preserving the data privacy (i.e., not disclosing the sensitive data) of multiple data providers. We then present SCC solutions for Pearson and Spearman’s correlation coefficients using secure scalar product. We show the correctness and secure property of the proposed solutions by presenting theorems and proving them formally. We also empirically show that the proposed solutions can be used for practical applications in the performance aspect.

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr

Digital Library[ Search Result ]

Search

Editorial Office