Appendix A.

These values represent the (empirical) probabilities of taking value and 0 respectively. Entropies will be denoted by . We aim to show that . We first note a ``chain rule'' expression for the entropy of a discrete variable. In particular, the entropy of any multivalued discrete variable can be decomposed in the following way:

Note moreover that the mutual information of two non-co-occurring variables is . The second term, the conditional entropy of given is

We now expand using the decomposition in (12):

Because and are never non-zero at the same time, all non-zero values of are paired with zero values of . Consequently and . The term denoted is the entropy of a binary variable whose probability is . This probability equals

Note that in order to obtain a non-negative probability in the above equation one needs , a condition that is always satisfied if and do not co-occur. Replacing the previous three equations in the formula of the mutual information, we get

This expression, remarkably, depends only on and . Taking its partial derivative with respect to yields

a value that is always negative, independently of . This shows the mutual information increases monotonically with the ``occurrence frequency'' of given by . Note also that the above expression for the derivative is the same as the result obtained for binary variables in (11).