Digital Library[ Search Result ]
Performance Improvement of LSM-tree Using Partial Flushing of MemTable
Hyeongjun Jeon, Hera Koo, Sungho Moon, Beomseok Nam
http://doi.org/10.5626/JOK.2023.50.1.87
Key-Value store, which is one of the NoSQL databases, uses Log-Structured Merge Tree(LSM Tree) as its index data structure. LSM Tree normally has good writing performance, but write amplification and write stall as chronic problems in LSM Tree have impeded the write performance of LSM Tree. In this paper, we introduce Extended MemTable which is an extended version of the current LSM Tree’s MemTable considering that recent datacenter’s main memory space is increasing. Extended Memtable uses partition which is divided by key ranges. It does the flush operation in the manner that the compaction operation can be operated effectively. It can increase the write throughput by up to 2 x and the read throughput by up to 4 x while reducing write amplification by up to 3.7 x compared to the original RocksDB by significantly reducing write amplification and write stall problems.
LSM Tree Compaction Offloading Using NVMe-oF
Sungho Moon, Hera Koo, Hyeongjun Jeon, Beomseok Nam
http://doi.org/10.5626/JOK.2022.49.7.569
NVMe-over-Fabrics (NVMe-oF) is drawing attraction in the industry as an alternative to disaggregated storage by providing fast access to remote NVMe SSDs through NVMe commands. In this paper, we propose RocksDB-oF, an LSM-Tree-based key-value store optimized for disaggregated storage using NVMe-oF. RocksDB-oF alleviated the Write Stall problem by offloading compaction from the computing node onto the storage node in consideration of the characteristics of NVMe-oF. In addition, a file system that uses Storage Performance Development Kit (SPDK) effectively solves the file system consistency problem of two nodes accessing the same NVMe SSD at the same time. Experimentally, in a disaggregated storage environment with NVMe-oF, RocksDB-oF showed higher write throughput than legacy RocksDB.
Optimization of Load Balancing on LSM-Tree based Distributed Key-Value Store using NVMe-oF
Hera Koo, Sungho Moon, Hyeongjun Jeon, Beomsuk Nam
http://doi.org/10.5626/JOK.2022.49.7.561
One of the challenges of distributed key-value databases which distribute and store data according to the key is load balancing. In this paper, we propose MongoRocks-oF, a redesigned MongoRocks, which is a MongoDB distributed database with RocksDB engine, an LSM-tree based key-value store, for optimizing load balancing with NVMe-over-Fabrics. MongoRocks-oF distributes data evenly in a round-robin fashion using NVMe-over-Fabrics that enables computing nodes to share multiple NVMe SSDs in remote storage through all-to-all connections, and fully utilizes storage resources. Based on this design, the proposed MongoRocks-oF improve load balance of legacy MongoRocks and shows better performance on write operations compared to MongoRocks.
Opt Tree: Write Optimized Tree Using Optane DCPM Internal Buffer
http://doi.org/10.5626/JOK.2021.48.7.742
Intel’s Optane DC Persistent Memory, a recently commercialized non-volatile byte-addressable memory, has an internal buffer of 256 bytes called XPLine, which processes memory access commands in units of cache lines or words. In this paper, we propose Opt Tree, a novel byte-addressable persistent index that utilizes the internal buffer of the Optane DCPM. Opt Tree divides the tree node into several small blocks of 256 bytes. For insertions and searches, Opt Tree accesses only two blocks. In our performance study, Opt Tree shows better insertion performance than the existing persistent indexes through its internal buffer-friendly design.
Xpass: NUMA-aware Persistent Memory Disaggregation
Jaeyoun Nam, Hokeun Cha, ByeongKeon Lee, Beomseok Nam
http://doi.org/10.5626/JOK.2021.48.7.735
The disaggregation method is used for efficient resource management in large-scale data centers, where each server consists of NUMA nodes. In the NUMA architecture, the latency difference between the remote and local access is known to be significant. In particular, remote NUMA access to persistent memory is even higher than DRAM. In this study, we propose Xpass, a memory disaggregation framework that considers the locality of NUMA architecture in a persistent memory disaggregation system. Xpass uses the dynamic hash table - CCEH to manage cached pages, and proposes a segment split algorithm that considers load balancing between the NUMA nodes in a NUMA environment.
NAFS : Stackable Filesystem for Maximizing Local Access in a NUMA System
Seungjun Ha, Hobin Woo, Euiseong Seo, Beomseok Nam
http://doi.org/10.5626/JOK.2021.48.6.612
Intel Optane DC Persistent Memory has read/write latencies comparable to DRAM but ensures data persistence as in block devices such as SSD. However, Optane DC PM modules are installed in DIMM slots of NUMA nodes but legacy block devices are installed in PCIe or SATA. Therefore, Optane DC PMs are known to suffer from the NUMA effects, and the performance of a multithreaded application depends on the NUMA locality. In this paper, we propose a novel stackable file system, NUMA-Aware Filesystem (NAFS). NAFS divides a file into segment units such that I/Os can be performed in local NUMA nodes where each application thread runs. To enable this feature, NAFS duplicates the file metadata across all NUMA nodes if the number of remote I/Os exceeds a certain threshold. Our performance study shows NAFS reduces the number of accesses to remote NUMA nodes significantly, improving the performance of multithreaded applications.
LFA-SkipList: Optimizing SkipList by Reducing Access in a NUMA-Aware System
Sunghwan Ahn, Yujin Jang, Seungjun Ha, Beomseok Nam
http://doi.org/10.5626/JOK.2021.48.1.1
Intel"s Optane DC Persistent Memory is a non-volatile memory that works faster than storage devices and stores data persistently. However, in the NUMA system, it takes a longer latency to access the remote memory of another CPU socket than for local NUMA access. Therefore, performance is degraded when configuring the SkipList in multiple non-volatile memories. In this paper, an LFA-SkipList was proposed to solve this problem. The LFA-SkipList has a newly added local pointer and uses it to access the local node first and then the remote node, thereby reducing unnecessary remote node access and improving performance. The study found the LFA-SkipList demonstrated a much shorter search time than that of the legacy SkipList.
PSL-DB: Non-Volatile Memory-optimized LSM-Tree with Skip List
Chanyeol Park, Dongui Kim, Beomseok Nam
http://doi.org/10.5626/JOK.2020.47.7.635
With the release of Intel"s Optane DC Persistent Memory, non-volatile memory, offering higher capacity than DRAM and showing higher performance than SSD and HDD, is in the spotlight as the next generation of storage devices. In this paper, we propose the Persistent Skip List DataBase (PSL-DB), a key-value store system optimized for the Optane DCPM in app-direct mode. PSL-DB uses a byte-addressable skip list that significantly reduces the I/O traffic as it avoids redundant writes. PSL-DB also does not sacrifice write performance for read performance as it does not degrade the write performance via artificial governors. In our experiments using Intel Optane DC Persistent Memory, PSL-DB shows significantly higher query processing throughput than legacy LevelDB that stores SSTables in Optane DC PM.
Search

Journal of KIISE
- ISSN : 2383-630X(Print)
- ISSN : 2383-6296(Electronic)
- KCI Accredited Journal
Editorial Office
- Tel. +82-2-588-9240
- Fax. +82-2-521-1352
- E-mail. chwoo@kiise.or.kr