TY - JOUR T1 - Creating a of Noisy Environment Speech Mixture Dataset for Korean Speech Separation AU - Jang, Jaehoo AU - Park, Kun AU - Lee, Jeongpil AU - Koo, Myoung-Wan JO - Journal of KIISE, JOK PY - 2024 DA - 2024/1/14 DO - 10.5626/JOK.2024.51.6.513 KW - noise data KW - overlapping sounds KW - speech overlap KW - speech separation KW - sound source separation KW - speech recognition AB - In the field of speech separation, models are typically trained using datasets that contain mixtures of speech and overlapping noise. Although there are established international datasets for advancing speech separation techniques, Korea currently lacks a similar precedent for constructing datasets with overlapping speech and noise. Therefore, this paper presents a dataset generator specifically designed for single-channel speech separation models tailored to the Korean language. The Korean Speech mixture with Noise dataset is introduced, which has been constructed using this generator. In our experiments, we train and evaluate a Conv-TasNet speech separation model using the newly created dataset. Additionally, we verify the dataset's efficacy by comparing the Character Error Rate (CER) between the separated speech and the original speech using a pre-trained speech recognition model.