Learning Disentangled Representation of Web Addresses via Convolutional-Recurrent Triplet Network for Phishing URL Classification

Seok-Jun Bu; Hae-Jung Kim

Learning Disentangled Representation of Web Addresses via Convolutional-Recurrent Triplet Network for Phishing URL Classification

Seok-Jun Bu

Hae-Jung Kim

Vol. 48, No. 2, pp. 147-153, Feb. 2021

10.5626/JOK.2021.48.2.147

phishing URL classification

convolutional-recurrent triplet network

deep metric learning

cyber-security

PDF

Abstract

Automated classification of phishing URLs propagated through hyperlinks is critical in environments reinforcing personal connections due to the explosive growth of social media services. Deep learning models for the classification of phishing URLs based on convolutional-recurrent neural networks yielded the best performance in terms of accuracy by modeling the character-level and word-level features. However, the deep learning-based classifier focused on the fitting of a given task via accumulated URLs is limited due to the class imbalance of the phishing attacks that are generated and discarded immediately. We address the class imbalance issue in terms of deep learning-based URL feature space generation task. We propose a modified triplet network structure that explicitly learns the similarity between URLs based on Euclidean distance to alleviate the limitations of the existing deep phishing classifiers. Experiments investigating the real-world dataset of 60,000 URLs collected from web addresses showed the highest performance among the latest deep learning methods, despite the hostile class imbalance. We also demonstrate that the generated URL feature space from the proposed method improved recall by 45.85% compared to the existing methods.

Statistics

Cumulative Counts from November, 2022
Multiple requests among the same browser session are counted as one view. If you mouse over a chart, the values of data points will be shown.

Cite this article

[IEEE Style]

S. Bu and H. Kim, "Learning Disentangled Representation of Web Addresses via Convolutional-Recurrent Triplet Network for Phishing URL Classification," Journal of KIISE, JOK, vol. 48, no. 2, pp. 147-153, 2021. DOI: 10.5626/JOK.2021.48.2.147.

[ACM Style]

Seok-Jun Bu and Hae-Jung Kim. 2021. Learning Disentangled Representation of Web Addresses via Convolutional-Recurrent Triplet Network for Phishing URL Classification. Journal of KIISE, JOK, 48, 2, (2021), 147-153. DOI: 10.5626/JOK.2021.48.2.147.

[KCI Style]

부석준, 김혜정, "피싱 URL 분류를 위한 컨볼루션-순환 트리플렛 신경망 기반 웹주소 특징공간의 학습," 한국정보과학회 논문지, 제48권, 제2호, 147~153쪽, 2021. DOI: 10.5626/JOK.2021.48.2.147.

[Endnote/Zotero/Mendeley (RIS)] Download

[BibTeX] Download

Search

Journal of KIISE

ISSN : 2383-630X(Print)
ISSN : 2383-6296(Electronic)
KCI Accredited Journal

Editorial Office

Tel. +82-2-588-9240
Fax. +82-2-521-1352
E-mail. chwoo@kiise.or.kr