Search : [ author: 서정연 ] (4)

Political Bias in Large Language Models and its Implications on Downstream Tasks

Jeong yeon Seo, Sukmin Cho, Jong C. Park

http://doi.org/10.5626/JOK.2025.52.1.18

This paper contains examples of political leaning bias that can be offensive. Abstract As the performance of the Large Language Models (LLMs) improves, direct interaction with users becomes possible, raising ethical issues. In this study, we design two experiments to explore the diverse spectrum of political stances that an LLM exhibits and how these stances affect downstream tasks. We first define the inherent political stances of the LLM as the baseline and compare results from three different inputs (jailbreak, political persona, and jailbreak persona). The results of the experiments show that the political stances of the LLM changed the most with the jailbreak attack, while lesser changes were observed with the other two inputs. Moreover, an experiment involving downstream tasks demonstrated that the distribution of altered inherent political stances can affect the outcome of these tasks. These results suggest that the model generates responses that align more closely like its inherent stance rather than the user’s intention to personalize responses. We conclude that the intrinsic political bias of the model and its judgments can be explicitly communicated to users.

A Named-Entity Recognition Training Method Using Bagging-Based Bootstrapping

Yujin Jeong, Juae Kim, Youngjoong Ko, Jungyun Seo

http://doi.org/10.5626/JOK.2018.45.8.825

Most previous named-entity(NE) recognition studies have been based on supervised learning methods. Although supervised learning-based NE recognition has performed well, it requires a lot of time and cost to construct a large labeled corpus. In this paper, we propose an NE recognition training method that uses an automatically generated labeled corpus to solve this problem. Since the proposed method uses a large machine-labeled corpus, it can greatly reduce the time and cost needed to generate a labeled corpus manually. In addition, a bagging-based bootstrapping technique is applied to our method in order to correct errors from the machine-labeled data. As a result, experimental results show that the proposed method achieves the highest F1 score of 70.76% by adding the bagging-based bootstrapping technique, which is 5.17%p higher than that of the baseline system.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods

Sunjae Kwon, Juae Kim, Sangwoo Kang, Jungyun Seo

http://doi.org/

Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Semantic Dependency Link Topic Model for Biomedical Acronym Disambiguation

Seonho Kim, Juntae Yoon, Jungyun Seo

http://doi.org/

Many important terminologies in biomedical text are expressed as abbreviations or acronyms. We newly suggest a semantic link topic model based on the concepts of topic and dependency link to disambiguate biomedical abbreviations and cluster long form variants of abbreviations which refer to the same senses. This model is a generative model inspired by the latent Dirichlet allocation (LDA) topic model, in which each document is viewed as a mixture of topics, with each topic characterized by a distribution over words. Thus, words of a document are generated from a hidden topic structure of a document and the topic structure is inferred from observable word sequences of document collections. In this study, we allow two distinct word generation to incorporate semantic dependencies between words, particularly between expansions (long forms) of abbreviations and their sentential co-occurring words. Besides topic information, the semantic dependency between words is defined as a link and a new random parameter for the link presence is assigned to each word. As a result, the most probable expansions with respect to abbreviations of a given abstract are decided by word-topic distribution, document-topic distribution, and word-link distribution estimated from document collection though the semantic dependency link topic model. The abstracts retrieved from the MEDLINE Entrez interface by the query relating 22 abbreviations and their 186 expansions were used as a data set. The link topic model correctly predicted expansions of abbreviations with the accuracy of 98.30%.


Search




Journal of KIISE

  • ISSN : 2383-630X(Print)
  • ISSN : 2383-6296(Electronic)
  • KCI Accredited Journal

Editorial Office

  • Tel. +82-2-588-9240
  • Fax. +82-2-521-1352
  • E-mail. chwoo@kiise.or.kr