Purpose The purpose of this paper is to develop a method to improve topic representation by incorporating the TextRank technique in Bertopic-based topic modeling and additional indicators for determining the optimal number of topics.
Design/methodol...
Purpose The purpose of this paper is to develop a method to improve topic representation by incorporating the TextRank technique in Bertopic-based topic modeling and additional indicators for determining the optimal number of topics.
Design/methodology/approach In this paper, we propose a method to extract important documents from documents assigned to each topic of a topic model using the TextRank technique, and to calculate secondary diversity and generate topic representations based on the results. First, we integrate the TextRank algorithm into the Bertopic-based topic modeling process to set local secondary labels for each topic. The secondary labels of each topic are derived through extractive summarization based on the TextRank algorithm. Second, we improve the accuracy of selecting the optimal number of topics by calculating the secondary diversity index based on the extractive summary results of each topic. Third, we improve the efficiency by utilizing ChatGPT when deriving the labels of each topic.
Findings As a result of performing case analysis and analysis evaluation using the proposed method, it was confirmed that topic representation based on TextRank results generated more accurate topic labels and that the secondary diversity index was a more effective index for determining the optimal number of topics.