The formula for mutual information between two random variables and is

A random variable is little more than a set that is associated with a probability function . Because of the way probabilities must work, there are rules about the relationships that must hold between the **joint probability ** the **marginal probabilities** and the **conditional probabilities** .

A common pattern in statistical NLP is to use estimates of the components of to define properties of either individual words or word pairs . Peter Turney’s article on PMI-IR (http://arxiv.org/abs/cs.LG/0212033) nicely explains what is going on here.

You can write two expressions for what Turney calls **pointwise mutual information **by slightly changing the summation in the expression above.

For a word we have:

and for a word pair we have:

Even though the mathematics dictates that is always positive, the PMI functions can produce either positive or negative quantities. In practice, the distributions that arise in NLP tend to yield large positive values for adjacent word pairs that intuitively feel strongly associated, such as “*Swiss bank*“. However, this is an empirical rather than a mathematical fact. Similarly, words that occur in what intuitively seem like “highly predictable contexts’ will tend to have large positive .

The “intuitively seems like” business above should alert you to the possibility that the PMI values may not **actually** correspond to anything meaningful. is definitely meaningful, but it doesn’t tell you much about individual words or word pairs.

There is a chain rule relating **joint and conditional entropy:**

where is the conditional entropy and is defined as:

A little algebra rearranges the chain rule expression to give

This is the that we call mutual information. The chain rule form makes it obvious that it is all about the difference between guessing one of the variables outright and guessing one when you already know the other. It should now also be intuitive that mutual information is non-negative, since it can be no harder to guess something outright than it can be if you are given a clue. As mentioned above, this does not guarantee that all the PMI functions that make up will be positive.

### Like this:

Like Loading...

*Related*

## Leave a Reply