Search : [ author: Sang-won Lee ] (7)

A Study on Buffer Management for Read-Once Pages Using a Read Buffer

Seongjae Moon, Sang-Won Lee, Young Ik Eom

http://doi.org/10.5626/JOK.2025.52.7.611

The relational database systems store pages in main memory to minimize storage access and improve transaction throughput. However, read-once pages, referenced only once before eviction, may force dirty pages to be flushed, reducing the page hit ratio. In SSD-based DBMS, read-once pages cause I/O serialization, forcing faster read operations to wait for slower write operations to complete. We analyze the I/O serialization caused by read-once pages and their characteristics, and propose a buffer management scheme to isolate read-once pages by using a read buffer. In the TPC-C benchmark, dirty pages evicted due to read-once pages account for about 8.9% of all flush operations. By isolating read-once pages into the read buffer, we reduced single page flushes by 56% and the page miss ratio in the normal buffer by 32% while increasing transaction throughput by 8% compared to Vanilla MySQL.

Addressing Write-Warm Pages in OLTP Workloads

Kyong-Shik Lee, Mijin An, Sang-Won Lee

http://doi.org/10.5626/JOK.2023.50.11.1002

One of the most important purposes of buffer management policies is to cache frequently accessed data in the buffer pool to minimize disk I/O. However, even if frequently referenced pages are effectively stored, a small number of pages can still result in excessive disk I.O. This is because of write-warm pages, which are repeatedly fetched and evicted from the buffer pool. In this paper, we introduce a “(Write-)Warm Page Thrashing” problem and confirm the existence of write-warm pages. Specifically, we found that 10% of flushed pages accounted for 41% of writes. This could degrade the performance, particularly for flash memory devices with slow write speeds. Therefore, a new buffer management policy is required to detect and prevent such thrashing problem.

Database Tuning Techniques to Mitigate SSD-internal Interference among Multi-tenant Databases

Seung-Jin Oh, Jong-Hyeok Park, Sang-Won Lee

http://doi.org/10.5626/JOK.2022.49.5.388

In a multi-tenant environment, multi-tenants share an SSD(Solid State Drive) as their storage device. Multi-tenants with different IO characteristics can interfere with each other at the channel level in terms of storage performance. In this paper, to harness the full potential of channel level parallelism of SSD, we proposed two tuning techniques: page size alignment and increasing readahead size. We measured transaction throughput and latency (execution time) while running Linkbench and TPC-H simultaneously in Docker container-based environment. Our evaluation showed that the page size alignment technique reduced unnecessary data padding/division overhead and prevented unnecessary IO requests from occupying the channel to mitigate interference, improving the performance of the Linkbench and the TPC-H. However, increasing readahead size raised SSD internal channel occupancy of sequential read requests and reduced the interference of the Linkbench, whose request size was small and access type was random. Thus, it only improved the TPC-H in terms of query execution performance.

A Compression-based Data Consistency Mechanism for File Systems

Dong Hyun Kang, Sang-Won Lee, Young Ik Eom

http://doi.org/10.5626/JOK.2019.46.9.885

Data consistency mechanism is a crucial component in any file system; the mechanism prevents the corruption of data from system crashes or power failures. For the sake of performance, the default journal mode of the Ext4 file system guarantees only the consistency of metadata while compromising with the consistency of normal data. Specially, it does not guarantee full consistency of the whole data of the file system. In this paper, we propose a new crash consistency scheme which guarantees strong data consistency of the data journal mode by still providing higher or comparable performance to the weak default journal mode of the Ext4 file system. By leveraging a compression mechanism, the proposed scheme can halve the amount of write operations as well as the number of fsync() system calls. For evaluation of the performance, we modified the codes related to the jbd2 and compared the proposed scheme with two journaling modes in Ext4 on SSD and HDD. The results clearly confirm that the proposed scheme outperforms the default journal mode by 8.3x times.

Implementation of a Prefetch method for Secondary Index Scan in MySQL InnoDB Engine

Dasom Hwang, Sang-Won Lee

http://doi.org/

Flash SSDs have many advantages over the existing hard disks such as energy efficiency, shock resistance, and high I/O throughput. For these reasons, in combination with the emergence of innovative technologies such as 3D-NAND and V-NAND for cheaper cost-per-byte, flash SSDs have been rapidly replacing hard disks in many areas. However, the existing database engines, which have been developed mainly assuming hard disks as the storage, could not fully exploit the characteristics of flash SSDs (e.g. internal parallelism). In this paper, in order to utilize the internal parallelism intrinsic to modern flash SSDs for faster query processing, we implemented a prefetching method using asynchronous input/output as a new functionality for secondary index scans in MySQL InnoDB engine. Compared to the original InnoDB engine, the proposed prefetching-based scan scheme shows three-fold higher performance in the case of 16KB-page sizes, and about 4.2-fold higher performance in the case of 4KB-page sizes.

External Merge Sorting in Tajo with Variable Server Configuration

Jongbaeg Lee, Woon-hak Kang, Sang-won Lee

http://doi.org/

There is a growing requirement for big data processing which extracts valuable information from a large amount of data. The Hadoop system employs the MapReduce framework to process big data. However, MapReduce has limitations such as inflexible and slow data processing. To overcome these drawbacks, SQL query processing techniques known as SQL-on-Hadoop were developed. Apache Tajo, one of the SQL-on-Hadoop techniques, was developed by a Korean development group. External merge sort is one of the heavily used algorithms in Tajo for query processing. The performance of external merge sort in Tajo is influenced by two parameters, sort buffer size and fanout. In this paper, we analyzed the performance of external merge sort in Tajo with various sort buffer sizes and fanouts. In addition, we figured out that there are two major causes of differences in the performance of external merge sort: CPU cache misses which increase as the sort buffer size grows; and the number of merge passes determined by fanout.

Performance Analysis of Flash Memory SSD with Non-volatile Cache for Log Storage

Dae-Yong Hong, Gi-Hwan Oh, Woon-Hak Kang, Sang-Won Lee

http://doi.org/

In a database system, updates on pages that are made by a transaction should be stored in a secondary storage before the commit is complete. Generic secondary storages have volatile DRAM caches to hide long latency for non-volatile media. However, as logs that are only written to the volatile DRAM cache don’t ensure durability, logging latency cannot be hidden. Recently, a flash SSD with capacitor-backed DRAM cache was developed to overcome the shortcoming. Storage devices, like those with a non-volatile cache, will increase transaction throughput because transactions can commit as soon as the logs reach the cache. In this paper, we analyzed performance in terms of transaction throughput when the SSD with capacitor-backed DRAM cache was used as log storage. The transaction throughput can be improved over three times, by committing right after storing the logs to the DRAM cache, rather than to a secondary storage device. Also, we showed that it could acquire over 73% of the ideal logging performance with proper tuning.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr