Next: Shared Task CoNLL'00 Up: Learning Rules and Their Previous: Break

Second Experiment: Using Prior Knowledge

We now present new results using this prior knowledge. Table 9 shows that the use of this prior knowledge improves results (+0.41). The most significant point is that the system using knowledge and a context of one element outperforms all previous results even with a context composed of two elements. A longer context does not improve results.

Table 9: Results using prior knowledge (K).

	K	TBL	w/o K	best old
$\theta$	0.50		0.5	0.50
Context	1	2	1	2
wo words	91.10	90.60	90.07	90.70
with words	92.59	92.03		92.18

One other advantage is the reduction of the learning time. Using this knowledge, which constrains strongly the search space, the learning process is 10 times faster.

In order to illustrate the advantage of the new categorization, we take a few examples of errors the TBL system produces and ALLiS does not.

1: are still {developed/JJ} but/CC
2: {the/DT buy-out/NN just/RB $/$ 15/CD millions/CD }
3: {late/JJ} {last/JJ year/NN}
4: {creditors/NNS early/JJ} {next/JJ month/NN}

The error (1) is due to the fact that the TBL system has to learn each context in which the element JJ is not tagged I (I being the default category automatically assigned to JJ). The problem would have been similar for ALLiS without using the new categorization: in this context, ALLiS can not assign the AL category since no nucleus occurs after the adjective. The errors (2) and (3) illustrate cases where a break is missing or wrong. The separation of the problem inside/break seems to provide better results, most of the errors made by TBL and not by ALLiS being of this kind. The error (4) is also due to a problem of breaker. Suppose ALLiS does also recognize next as breaker after early. The second chunk would be still wrong, but the first one (creditors) would be right, since the word early/JJ would be categorize as left adjunct. And since a left adjunct can not itself compose a chunk, no NP will be recognized with early. ALLiS' output would be: {creditors/NNS} early/JJ {next/JJ month/NN}.

The general conclusion we can draw is that a more precise distributional categorization provides better results than a general one (using only I and B) for this task.

Next: Shared Task CoNLL'00 Up: Learning Rules and Their Previous: Break

Hammerton J. 2002-03-13