next up previous
Next: The Categorization Process Up: Refining the Initial Categorization Previous: Refining the Initial Categorization

Advantage of the new categories:

Is such knowledge interesting or useful? One simple experiment can be carried out, using this information in order to compute a new baseline. As for the preceding baseline, each word is categorized into its most frequent category. The baseline reaches now a F1-score of 86%, an improvement of 6%.

Table 6 well illustrates the advantages of the new subcategories. Let us take the element JJ. Its default category is I (84%) in the old categorization. In the new categorization, its default category is left adjunct (AL) with a very high accuracy (99%). It is then really easier to categorize it. In other words, an adjective belongs to an NP if it occurs before a noun (with 1% of exceptions).


Table 6: Distribution of some elements in the old and new categories.
TAG O B-NP I-NP I
        NU AL AR O B+
PRP 0% 9% 91% 100% -   0% 100%
NN 2% 17% 81% 98% -   2% 3%
JJ 15% 1% 84% - 99% - 1% 24%
VBG 87% 6% 10% - 22% - 78% 3%


Another advantage of this categorization concerns the breaker problem. In the new categorization, the break is a property that each category can have, and it is not a competing categorization. This corresponds to the following view: One problem is the membership of an element to a given structure (categorization into NU or A). The second problem is to determine whether an element is a breaker or not. The two facts are not competing as they are in the IOB categorization: if an element is a breaker (tagged B in the old categorization), it has to belong to the structure. The tag PRP offers a good example. In the old categorization, it is mostly tagged I (91%), and sometimes B. In the new, it is always tagged N, and always has the break property. The 9% of the tag B correspond to the 9% where the preceding element is an NP. Two rules are then sufficient where a dozen were required with the old categorization.


next up previous
Next: The Categorization Process Up: Refining the Initial Categorization Previous: Refining the Initial Categorization
Hammerton J. 2002-03-13