Next: The Categorization Process
Up: Refining the Initial Categorization
Previous: Refining the Initial Categorization
Is such knowledge interesting or useful?
One simple experiment can be carried out, using this information in order to compute a new baseline.
As for the preceding baseline, each word is categorized into its most frequent category.
The baseline reaches now a F1-score of 86%, an improvement of 6%.
Table 6 well illustrates the advantages of the new subcategories.
Let us take the element JJ. Its default category is I (84%) in the old categorization.
In the new categorization, its default category is left adjunct (AL) with a very high accuracy (99%).
It is then really easier to categorize it.
In other words, an adjective belongs to an NP if it occurs before a noun (with 1% of exceptions).
Table 6:
Distribution of some elements in the old and new categories.
| TAG |
O |
B-NP |
I-NP |
I |
| |
|
|
|
NU |
AL |
AR |
O |
B+ |
| PRP |
0% |
9% |
91% |
100% |
- |
|
0% |
100% |
| NN |
2% |
17% |
81% |
98% |
- |
|
2% |
3% |
| JJ |
15% |
1% |
84% |
- |
99% |
- |
1% |
24% |
| VBG |
87% |
6% |
10% |
- |
22% |
- |
78% |
3% |
|
Another advantage of this categorization concerns the breaker problem.
In the new categorization, the break is a property that each category can have, and it is not a competing categorization.
This corresponds to the following view: One problem is the membership of an element to a given structure (categorization into NU or A).
The second problem is to determine whether an element is a breaker or not.
The two facts are not competing as they are in the IOB categorization: if an element is a breaker (tagged B in the old categorization), it has to belong to the structure.
The tag PRP offers a good example.
In the old categorization, it is mostly tagged I (91%), and sometimes B.
In the new, it is always tagged N, and always has the break property.
The 9% of the tag B correspond to the 9% where the preceding element is an NP.
Two rules are then sufficient where a dozen were required with the old categorization.
Next: The Categorization Process
Up: Refining the Initial Categorization
Previous: Refining the Initial Categorization
Hammerton J.
2002-03-13