PHÂN CỤM ĐỘ ĐO LỢI ÍCH TRÊN CƠ S

PHÂN CỤM ĐỘ ĐO LỢI ÍCH TRÊN CƠ SỞ CẤU TRÚC SUMMARY
Huỳnh Xuân Hiệp¹, Lê Quyết Thắng¹, Fabrice Guillet²
¹ Khoa Công nghệ Thông tin & Truyền thông, Trường Đại học Cần Thơ
²LINA UMR CNRS 6241, Trường Đại học Bách khoa Nantes
{hxhiep, lqthang}@ctu.edu.vn, fabrice.guillet@univ-nantes.fr

Tóm tắt
Trong bài viết này chúng tôi giới thiệu một cấu trúc mới với tên gọi summary nhằm phân cụm các độ đo lợi ích (interestingness measures) trên cơ sở sử dụng các dạng kết hợp phức tạp từ các hệ số tương quan giá trị Pearson, tương quan thứ tự Spearman và Kendall. Các kết quả phân cụm sẽ được hiển thị nhờ vào tiếp cận đồ thị tương quan (correlation graph). Cùng với những kết quả ban đầu với tiếp cận về đồ thị tương quan trước đây, hành vi quan sát được của các độ đo lợi ích trên cơ sở các tương quan phức tạp hình thành trên các đồ thị tương quan mở rộng, đã chứng tỏ sự thống nhất của một bộ phận các độ đo lợi ích trong việc đánh giá chất lượng tri thức được biểu diễn dưới dạng luật kết hợp (association rules) tương ứng với từng dạng tập dữ liệu khác nhau.

Từ khoá: cấu trúc summary, đồ thị tương quan mở rộng, luật kết hợp, độ đo lợi ích, phân cụm độ đo lợi ích.

CLUSTERING OF INTERESTINGNESS MEASURES BASED ON
SUMMARY STRUCTURE
Huynh Xuan Hiep¹, Le Quyet Thang¹, Fabrice Guillet²
¹ Faculty of Information Technology and Communication, Univeristy of Can Tho
² LINA UMR CNRS 6241, University of Nantes
{hxhiep, lqthang}@ctu.edu.vn, fabrice.guillet@univ-nantes.fr

Abstract
In this paper, interestingness measures are clustered based on a new structure called summary by using different complex combinations from valued and ranked correlation coefficients such as Pearson, Spearman and Kendall. The results will be visualized by the help of the correlation graph aprroach. Together with the preliminary results presented, the observed behaviors of the interestingness measures based on the complex correlations with the extended correlation graphs, are demonstrated a highly agreement of a small set of the interestingness measures for evaluating the quality of knowledge represented in the form of association rules, depending on the kinds of data set used respectively.

Key words: summary structure, extended correlation graph, association rules, interestingness measures, clustering of interestingness measures.