A new approach on search for similar documents with multiple categories using fuzzy clustering


Saracoglu R. , Tuetuencue K., Allahverdi N.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.34, ss.2545-2554, 2008 (SCI İndekslerine Giren Dergi) identifier identifier

  • Cilt numarası: 34 Konu: 4
  • Basım Tarihi: 2008
  • Doi Numarası: 10.1016/j.eswa.2007.04.003
  • Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
  • Sayfa Sayıları: ss.2545-2554

Özet

Searching for similar document has an important role in text mining and document management. In whether similar document search or in other text mining applications generally document classification is focused and class or category that the documents belong to is tried to be determined. The aim of the present study is the investigation of the case which includes the documents that belong to more than one category. The system used in the present study is a similar document search system that uses fuzzy clustering. The situation of belonging to more than one category for the documents is included by this system. The proposed approach consists of two stages to solve multicategories problem. The first stage is to find out the documents belonging to more than one category. The second stage is the determination of the categories to which these found documents belong to. For these two aims alpha-threshold Fuzzy Similarity Classification Method (alpha-FSCM) and Multiple Categories Vector Method (MCVM) are proposed as written order. Experimental results showed that proposed system can distinguish the documents that belong to more than one category efficiently. Regarding to the finding which documents belong to which classes, proposed system has better performance and success than the traditional approach. (c) 2007 Elsevier Ltd. All rights reserved.