Often, systems only consider noiseless data and try to learn ``perfect'' rules (with an accuracy of 1). In order to deal with noise, some integrate mechanisms for stopping the expansion (see [Pazzani and Kibler(1992)]) or pruning mechanisms (for example, reduced error pruning in [Brunk and Pazzani(1991)]). As stopping mechanism, FOIL uses encoded-length heuristic indicating that the rule is becoming too complex. Another simple way is to set up a threshold lower than 1 as in [Soderland(1999)]. This is the same solution we have chosen to adopt. But what about the few percents wrongly categorized? Should we ignore them? The refinement process tries to learn these cases, considered as exceptions of the rule just learned. Let us take a simple example in order to illustrate this mechanism. Consider the rule:
<RULE NUM='1' ACC='0.98' FREQ='29027'> <N C='NN'/> </RULE>Its accuracy is 0.98. Let the threshold
<RULE EXCEP='1' NUM='3' ACC='1.00' FREQ='17'> <N W='have' C='NN'/> </RULE> <RULE EXCEP='1' NUM='98' ACC='0.90' FREQ='9'> <N C='IN' LEFT='1'/> <N W='order' C='NN'/> <N C='TO' RIGHT='1'/> </RULE>Such exceptions are called refinement rules. The feature EXCEP points to the rule (its identifier) to which it is an exception. Both exceptions are due to two different explanations. The first refinement rule illustrates well the interest of refinement for noisy data. In the training data, some occurrences of the word have are wrongly tagged as NN (singular noun) instead of verb (remember that this tagging is done automatically).
The second rule illustrates another kind of ``noise'', or more appropriately, illustrates a linguistic exception in the encoding schemata. It is specified in the guidelines of the Upenn Treebank, from where our data is extracted, that in the context in order to, the word order has to be tagged NN but does not belong to an NP. One can find other exceptions similar to this one in the guidelines (subject to for instance). These exceptions, which are not noise, since linguistically motivated, can be easily encapsulated thanks to refinement.
ALLiS can also learn exceptions of exceptions (and so on).
This happens rarely when
is high enough, but frequently in other cases.
The next rule could be an exception to the rule 98, but it is not kept since its frequency is to small.
<RULE EXCEP='98' NUM='99' ACC='1.00' FREQ='1'> <N W='of' C='IN' LEFT='1'/> <N W='order' C='NN'/> <N C='TO' RIGHT='1'/> </RULE>
Each refinement rule is in fact a more specific rule describing a context in which the ``mother'' rule2 (this referred by the feature EXCEP) is no longer validated.