A new approach on search for similar documents with multiple categories using fuzzy clustering

Saracoglu R., Tuetuencue K., Allahverdi N.

EXPERT SYSTEMS WITH APPLICATIONS, vol.34, no.4, pp.2545-2554, 2008 (Peer-Reviewed Journal) identifier identifier

  • Publication Type: Article / Article
  • Volume: 34 Issue: 4
  • Publication Date: 2008
  • Doi Number: 10.1016/j.eswa.2007.04.003
  • Journal Indexes: Science Citation Index Expanded, Scopus
  • Page Numbers: pp.2545-2554


Searching for similar document has an important role in text mining and document management. In whether similar document search or in other text mining applications generally document classification is focused and class or category that the documents belong to is tried to be determined. The aim of the present study is the investigation of the case which includes the documents that belong to more than one category. The system used in the present study is a similar document search system that uses fuzzy clustering. The situation of belonging to more than one category for the documents is included by this system. The proposed approach consists of two stages to solve multicategories problem. The first stage is to find out the documents belonging to more than one category. The second stage is the determination of the categories to which these found documents belong to. For these two aims alpha-threshold Fuzzy Similarity Classification Method (alpha-FSCM) and Multiple Categories Vector Method (MCVM) are proposed as written order. Experimental results showed that proposed system can distinguish the documents that belong to more than one category efficiently. Regarding to the finding which documents belong to which classes, proposed system has better performance and success than the traditional approach. (c) 2007 Elsevier Ltd. All rights reserved.