Search : [ author: 이명진 ] (1)

CoEM: Contrastive Embedding Mapper for Audio-visual Latents

Gihun Lee, Kyungchae Lee, Minchan Jeong, Myungjin Lee, Se-young Yun, Chan-hyun Yun

http://doi.org/10.5626/JOK.2023.50.1.80

Human perception can link audio-visual information to each other, making it possible to recall visual information from audio information and vice versa. Such ability is naturally acquired by experiencing situations where these two kinds of information are combined. However, it is hard to obtain video datasets that are richly combined with both types of information, and at the same time, labeled for the semantics of each scene. This paper proposes a Contrastive Embedding Mapper (CoEM), which maps embedding from one type of information to the another, corresponding to its categorical modality. Paired data is not required, CoEM learns to contrast the mapped embedding by its categories. We validated the efficacy of CoEM on the embeddings for audio and visual datasets which were trained to classify 20 shared categories. In the experiment, the embedding mapped by CoEM showed that it was capable of retrieving and generating data on its mapped domain.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr