Pages

Monday, August 5, 2013

Concepts in Microsoft Association Rules: Lift, Support, Importance, and Probability

       You must have heard of the famous diapers and beer story that illustrates the correlation in shoppers' baskets. Market basket analysis by association rule mining has been widely used by retailers since 1990s to adjust store layouts, and to develop cross-promotion plans and catalogs. Nowadays instant recommendations with association rules becomes a hot spot for research. Microsoft Association Rules algorithm is a common algorithm to create association rules, which can be used in a market basket analysis. It supports several parameters that affect the behavior, performance, and accuracy of the resulting mining model. Therefore, it is important to have a clear understanding about these following concepts.      


LIFT


In data mining and association rule learning, lift is a measure of the performance of a model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random-choice targeting model. For example, suppose that 5% of the customers mailed a catalog without using the model would make a purchase. But a certain model (or rule) has identified a segment with a response rate of 15%. Then that segment would have a lift of 3.0 (15%/5%). Lift indicates how well the model improved the predictions over a random selection given actual results.

SUPPORT


Support is the probability of a transaction contains targeted item or itemset. The larger Support is, the larger number of cases that contain the targeted item or combination of items the model has. You can use parameter MINIMUM_SUPPORT and MAXIMUM_SUPPORT to define the thresholds. By default, MINIMUM_SUPPORT is 0.0 and MAXIMUM_SUPPORT is 1.0.

RULEs


The Rules tab in Microsoft Association Rules Viewer displays Probability, Importance, Rule that are related to rules that the mining algorithm finds.

Rule: A rule is a description of the presence of an item in a transaction based on the presence of other items.

Probability: The likelihood of a rule, defined as the probability of the right-hand item given the left-hand side item. By default, MINIMUM_PROBABILITY is 0.4. However, probability sometimes is misleading. For example, if every transaction contains a gift bag--perhaps the gift bag is added to each customer's cart automatically as a promotion, a rule predicting that gift bag has a probability of 1. It is accurate but not very useful. To flag the usefulness of a rule, Importance is the right measure to use.

Importance: A measure of the usefulness of a rule. A greater value means a better rule. The importance of a rule is calculated by the log likelihood of the right-hand side of the rule, given the left-hand side of the rule. For example, in the rule of If {A} then {B}, the importance is Log( Pr(A&B)/  Pr(B without A) ) .

No comments:

Post a Comment