A Bayes Optimal Approach for Partitioning the Values of Categorical Attributes

Marc Boullé; 6(Sep):1431--1452, 2005.

Abstract

In supervised machine learning, the partitioning of the values (also called grouping) of a categorical attribute aims at constructing a new synthetic attribute which keeps the information of the initial attribute and reduces the number of its values. In this paper, we propose a new grouping method MODL founded on a Bayesian approach. The method relies on a model space of grouping models and on a prior distribution defined on this model space. This results in an evaluation criterion of grouping, which is minimal for the most probable grouping given the data, i.e. the Bayes optimal grouping. We propose new super-linear optimization heuristics that yields near-optimal groupings. Extensive comparative experiments demonstrate that the MODL grouping method builds high quality groupings in terms of predictive quality, robustness and small number of groups.

[abs][pdf]




Home Page

Papers

Submissions

News

Editorial Board

Announcements

Proceedings

Open Source Software

Search

Statistics

Login

Contact Us



RSS Feed