next up previous
Next: Second Experiment: Using Prior Up: The Categorization Process Previous: Linkers


Break

A sequence of several nuclei (and the adjuncts which depend on them) can belong to a unique structure or compose several adjacent structures. An element is a breaker if its presence introduces a break into a sequence of adjacent nuclei. For example, the presence of the tag DT in the sequence NN DT JJ NN introduces a break before the tag DT, although the sequence NN JJ NN (without DT) can compose a single structure in the training corpus.
...[the/DT coming/VBG week/NN] [the/DT foreign/JJ exchange/NN market/NN] ...
The tag DT introduces a break on its left, but some tags can introduce a break on their right or on their left and right. For instance, the tag WDT (NU by default) introduces a break on its left and on its right. In other words, this tag can not belong to the same structure as the preceding adjacent nucleus and to the same structure as the following adjacent nucleus.
...[ railroads/NNS and/CC trucking/NN companies/NNS ] [ that/WDT ] began/VBD in/IN [ 1980/CD ] ...
...in/IN [ which/WDT ] [ people/NNS ] generally/RB are/VBP ...
In order to detect which tags have the break property, we build up two functions $f\raisebox{-5pt}{\footnotesize {b left}}$ and $f\raisebox{-5pt}{\footnotesize {b right}}$ described equations (3) and (4) (NU:{nuclei}, Cwb: corpus without brackets).
$\displaystyle f\raisebox{-5pt}{\footnotesize {b left}}(X) = \frac{\sum_{C} NU ]  [ X}{\sum_{C\raisebox{-5pt}{\footnotesize wb}} NU  X}$     (3)


$\displaystyle f\raisebox{-5pt}{\footnotesize {b right}}(X) = \frac{\sum_{C} X ]  [ NU}{\sum_{C\raisebox{-5pt}{\footnotesize wb}} X  NU}$     (4)

These functions compute the break property for the element X. Table 8 shows some values for some tags. An element can be a left breaker (DT), a right breaker (no example for English NP at the tag level), or both (PRP). The break property is generally well-marked and the threshold is easy to set up (0.66 in practice).


Table 8: Values of the functions for some elements.
TAG fb left fb right
DT 0.97 (yes) 0.00(no)
PRP 0.97 (yes) 0.68(yes)
POS 0.95 (yes) 0.00(no)
PRP$ 0.94 (yes) 0.00(no)
JJ 0.44 (no) 0.00(no)
NN 0.04 (no) 0.11(no)
NNS 0.03 (no) 0.14(no)



next up previous
Next: Second Experiment: Using Prior Up: The Categorization Process Previous: Linkers
Hammerton J. 2002-03-13