Vol. 49, No. 3,
Mar. 2022
Digital Library
Performance Analysis of CERN EOS Distributed File System under Bare-metal and Virtualization Environments
Jun-Yeong Lee, Moon-Hyun Kim, Kyeong-Jun Kim, Seo-Young Noh
http://doi.org/10.5626/JOK.2022.49.3.189
To store large amounts of data, the distributed file system has been used in many research facilities and large-scale data centers. Traditional distributed file systems were configured by installing a distributed file system which is referred to as “bare-metal”, directly on server. Recently, with easy management and fast failover capabilities, these systems have been configured and delivered through a virtual environment. In this paper, we analyzed the EOS distributed file system developed and used by CERN(Conseil Européen pour la Recherche Nucléaire), which produces the largest amount of experimental data in the world And using both Bare-Metal environment and KVM(Kernel-based Virtual Machine)-based virtual environment, we analyzed the file system performance of these two environments. We compared the performances and analyzed the different environmental characteristics and presented the advantages of the I/O performance of the distributed file system in the virtual environment from our experimental results.
Implementation and Application of Functional Encryption-Based Matrix Multiplication
http://doi.org/10.5626/JOK.2022.49.3.196
Functional Encryption is an encryption scheme that allows someone who possesses a secret key to obtain only the function value from a ciphertext but not to learn any information about the plaintext. In this paper, we proposed a method to calculate a matrix product based on inner-product functional encryption and accelerated the proposed method by applying precomputation. In addition, we proposed a privacy-preserving application for dimensionality reduction of the vectors by performing secure principal component analysis (PCA) based on the proposed method. According to the experimental results, matrix multiplication based on functional encryption for a 1000-dimensional square matrix and a 1000-dimensional vector was performed in 452.66 seconds and was accelerated by 3.81 times using 4.46 MB of memory when the precomputation was applied, i.e., it was performed in 118.87 seconds.
Algorithms for Dividing 1-dimensional Point Set into Rainbow Subsets
http://doi.org/10.5626/JOK.2022.49.3.201
When color is assigned to data that are expressed by a set of points in geometric space, a set of points that includes at least one point of each color is defined as a color-spanning set or a rainbow set. This paper suggests algorithms for determining optimal ways of selecting points from a colored one-dimensional point set such that the subsets composed of contiguous (selected) points and the set of remaining points are all rainbow sets. The suggested algorithms aim to minimize the number of selected points or minimize the total lengths of the regions that contain the selected points.
Programming New Operator Symbols Using C++ Operator Overloading
http://doi.org/10.5626/JOK.2022.49.3.207
Even though modern programming languages such as Haskell allow users to define new operators, it is not allowed in most programming languages. This limitation also exists in such programming languages that allow operator overloading such as C++. C++ allows to change the meaning of native operators but does not allow to define a new operator. In spite of this limitation, this paper explains how to program a new operator symbol in C++. The idea of this paper is, in fact, to mimic a new operator symbol using C++ operator overloading. The trick lies on the principle namely “maximal munch” adopted by the lexical analyzer. The method proposed in this paper contributes to the writability of the client code and is expected to promote the degree of freedom for a programmer to write the code.
Effective Transfer Learning in Text Classification with the Label-Based Discriminative Feature Learning
http://doi.org/10.5626/JOK.2022.49.3.214
The performance of the natural language processing with transfer learning methodology has improved by pre-training language models with a large amount of general data and applying them on downstream tasks. However, the problem is that it learns general features rather than those specific to the downstream tasks as the data used in pre-training is irrelevant to the downstream tasks. This paper proposes a novel learning method for embeddings of pre-trained models to learn specific features of the downstream tasks. The proposed method is to learn the label feature of the downstream tasks through contrast learning with label embedding and sampled data pairs. To demonstrate the performance of the proposed method, we conducted experiments on sentence classification datasets and evaluated whether features of downstream tasks have been learned through PCA(Principal component analysis) and clustering on embeddings.
Graph Embedding-Based Point-Of-Interest Recommendation Considering Weather Features
Kun Woo Lee, Jongseon Kim, Yon Dohn Chung
http://doi.org/10.5626/JOK.2022.49.3.221
As the Location-Based Services (LBS) grow rapidly, the Point-Of-Interest (POI) recommendation becomes an active research area to provide users appropriate information relevant to their locations. Recently, translation-based recommendation systems using graph embedding, such as TransRec, are attracting great attention. In this paper, we discovered some drawbacks of TransRec; it is limited in expressing the complex relationship between users and POIs, and the relation embedding is fixed without considering weather features. We propose WAPTRec, a graph embedding-based POI recommendation method considering the weather, that overcomes the drawback of TransRec. WAPTRec can rep resent the same POI embedding in different ways according to users by using a category projection matrix and attention mechanism. In addition, it provides better recommendation accuracy by utilizing the users’ movement history, category of POIs and weather features. Experiments using public datasets illustrated that WAPTRec outperformed the conventional translation-based recommendation methods.
Solving Factual Inconsistency in Abstractive Summarization using Named Entity Fact Discrimination
Jeongwan Shin, Yunseok Noh, Hyun-Je Song, Seyoung Park
http://doi.org/10.5626/JOK.2022.49.3.231
Factual inconsistency in abstractive summarization is a problem that a generated summary can be factually inconsistent with a source text. Previous studies adopted a span selection that replaced entities in the generated summary with entities in the source text because most inconsistencies are related to incorrect entities. These studies assumed that all entities in the generated summary were inconsistent and tried to replace all entities with other entities. However, this was problematic because some consistent entities could be replaced and masked, so information on consistent entities was lost. This paper proposes a method that sequentially executes a fact discriminator and a fact corrector to solve this problem. The fact discriminator determines the inconsistent entities, and the fact corrector replaces only the inconsistent entities. Since the fact corrector corrects only the inconsistent entities, it utilizes the consistent entities. Experiments show that the proposed method boosts the factual consistency of system-generated summaries and outperforms the baselines in terms of both automatic metrics and human evaluation.
Efficient Approach for Encoding and Compression of RDF Knowledge Bases
Tangina Sultana, Young-Koo Lee
http://doi.org/10.5626/JOK.2022.49.3.241
Due to the enormous growth of entity-centric search and natural language-based queries, the applicability of Knowledge Bases (KBs) is increasing exponentially. Therefore, it requires efficient SPARQL queries. Resource Description Framework (RDF) engines mostly employ order, coordinates, syntactic, and hash-based encoding for managing KBs. However, most current schemes do not have a better compression ratio, faster loading time, or efficient query performance. To address these concerns, in this paper, we propose a novel approach for detecting frequent and semantically related terms to achieve a higher compression ratio and enhance the performance of SPARQL queries on compressed and encoded data. This scheme was based on a dictionary encoding algorithm, a combined approach of statistical and semantic schemes. We also introduced another scheme for identifying infrequent terms based on their semantics. The system then assembled semantically related data into ontological classes that could further reduce the required memory footprint as well as loading time. We analyzed and compared the performance of our proposed scheme with those of existing state-of-the-art approaches. The simulation result affirmed that our proposed approach compressed and encoded KBs substantially better than existing systems.
Privacy-preserving Pre-computation of Join Selectivity using Differential Privacy for the Proliferation of Pseudonymized Data Combination
Hyubjin Lee, Jong Seon Kim, Yon Dohn Chung
http://doi.org/10.5626/JOK.2022.49.3.250
With the enforcement of 3 data acts, pseudonymized information from various domains can be joined through certified expert agencies. Before joining all pseudonymized information, the expert agency provides a service that can compute the join selectivity in advance. However, the existing join selectivity pre-computation methods have vulnerabilities that can lead to privacy breaches. In this paper, we propose a privacy-preserving join selectivity pre-computation method that uses randomly generated one-time key values provided by the expert agency for anonymizing data through a one-way hash technique, and ensures differential privacy when pre-computing join selectivity. The proposed method ensures the anonymity of the data sent by the join requesting institutions to the expert agency and prevents privacy breaches that may occur in the previous join selectivity pre-computation methods. The experimental results showed that the proposed method provided effective join selectivity while satisfying differential privacy.
Parallel Optimization of Deep Learning Computation Offloading in Edge Computing Environment
Kwang Yong Shin, Soo-Mook Moon
http://doi.org/10.5626/JOK.2022.49.3.256
Computation offloading to edge servers has been proposed as a solution to performing computation-intensive deep learning applications on devices with low hardware capabilities. However, the deep learning model has to be uploaded to the edge server before computation offloading is possible, a non-trivial assumption in the edge server environment. Incremental offloading of neural networks was proposed as a solution as it can simultaneously upload model and offload computation [1]. Although it reduced the model upload time required for computation offloading, it did not properly handle the model creation overhead, increasing the time required to upload the entire model. This work solves this problem by parallel optimization of model uploading and creation, decreasing the model upload time by up to 30% compared to the previous system.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr