SSC: Clustering Of Turkish Texts By Spectral Graph Partitioning


UÇKAN T., HARK C., KARCI A.

JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI, cilt.24, sa.4, ss.1433-1444, 2021 (ESCI) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 24 Sayı: 4
  • Basım Tarihi: 2021
  • Doi Numarası: 10.2339/politeknik.684558
  • Dergi Adı: JOURNAL OF POLYTECHNIC-POLITEKNIK DERGISI
  • Derginin Tarandığı İndeksler: Emerging Sources Citation Index (ESCI), TR DİZİN (ULAKBİM)
  • Sayfa Sayıları: ss.1433-1444
  • Anahtar Kelimeler: Graph partitioning, spectral graph theory, binary text clustering, text categorization, text mining
  • Van Yüzüncü Yıl Üniversitesi Adresli: Evet

Özet

There is growing interest in studies on text classification as a result of the exponential increase in the amount of data available. Many studies have been conducted in the field of text clustering, using different approaches. This study introduces Spectral Sentence Clustering (SSC) for text clustering problems, which is an unsupervised method based on graph-partitioning. The study explains how the proposed model proposed can be used in natural language applications to successfully cluster texts. A spectral graph theory method is used to partition the graph into non-intersecting sub-graphs, and an unsupervised and efficient solution is offered for the text clustering problem by providing a physical representation of the texts. Finally, tests have been conducted demonstrating that SSC can be successfully used for text categorization. A clustering success rate of 97.08% was achieved in tests conducted using the TTC-3600 dataset, which contains open-access unstructured Turkish texts, classified into categories. The SSC model proposed performed better compared to a popular k-means clustering algorithm.