ICADL 2007 - LNCS 4822
   

Personal Name Disambiguation in Web Search Results Based on a Semi-supervised Clustering Approach

Kazunari Sugiyama and Manabu Okumura

Precision and Intelligence Laboratory, Tokyo Institute of Technology, 4259 Nagatsuta, Midori, Yokohama, Kanagawa 226-8503, Japan
sugiyama@lr.pi.titech.ac.jp
oku@pi.titech.ac.jp

Abstract. Most of the previous works that disambiguate personal names in Web search results often employ agglomerative clustering approaches. In contrast, we have adopted a semi-supervised clustering approach in order to guide the clustering more appropriately. Our proposed semi-supervised clustering approach is novel in that it controls the fluctuation of the centroid of a cluster, and achieved a purity of 0.72 and inverse purity of 0.81, and their harmonic mean F was 0.76.

Keywords: Information retrieval, Semi-supervised clustering, Personal name disambiguation

LNCS 4822, p. 250 ff.

Full article in PDF | BibTeX


lncs@springer.com
© Springer-Verlag Berlin Heidelberg 2007