Learning Semantic Lexicons from a Part-of-Speech and Semantically Tagged Corpus Using Inductive Logic Programming

Vincent Claveau, Pascale Sébillot, Cécile Fabre, Pierrette Bouillon; 4(Aug):493-525, 2003.


This paper describes an inductive logic programming learning method designed to acquire from a corpus specific Noun-Verb (N-V) pairs---relevant in information retrieval applications to perform index expansion---in order to build up semantic lexicons based on Pustejovsky's generative lexicon (GL) principles (Pustejovsky, 1995). In one of the components of this lexical model, called the qualia structure, words are described in terms of semantic roles. For example, the telic role indicates the purpose or function of an item (cut for knife), the agentive role its creation mode (build for house), etc. The qualia structure of a noun is mainly made up of verbal associations, encoding relational information. The learning method enables us to automatically extract, from a morpho-syntactically and semantically tagged corpus, N-V pairs whose elements are linked by one of the semantic relations defined in the qualia structure in GL. It also infers rules explaining what in the surrounding context distinguishes such pairs from others also found in sentences of the corpus but which are not relevant. Stress is put here on the learning efficiency that is required to be able to deal with all the available contextual information, and to produce linguistically meaningful rules.