SEECS Colloquium

Exploiting Mulitlingual Corpora in Acquiring Lexical Knowledge

Dr. Suzanne Stevenson
Department of Computer Science
University of Toronto

Tuesday, April 16, 2002
1:30 PM
CSB 232


Abstract

Much of the grammatical knowledge that is needed for natural language technologies is associated with individual words. This poses a challenge for achieving robust and wide-coverage systems, since new or unanticipated words (or uses of words) are frequently encountered. Our approach to this problem is to exploit the possibility of automatically classifying verbs, thereby leveraging our encoding of known uses of words, to new words or uses. We focus on verbs because they are the seat of the predicate-argument knowledge that guides the essential interpretation of "who did what to whom". Since verbs that share semantic properties are similar in their surface syntactic behavior, we have used simple statistics over a corpus to successfully classify verbs that share both semantic and syntactic properties.

In recent work, we have developed a new approach to automatic classification which brings to bear the power of multilingual resources. Syntactic behavior can reveal the underlying semantics of verbs, but languages differ in the degree to which semantic properties are expressed overtly in extractable features. Thus, combining features over verbs in one language, with features over the translations of the verbs in another language, can increase the information available to the classifier. As one example of the success of the method, we have improved the classification of English verbs by augmenting an English feature set with features based on Chinese translations of the verbs, achieving an accuracy of 85% (compared to 68% using English-only features), on a task with a 33% baseline. This is joint work with Paola Merlo (U. of Geneva) and Vivian Tsang (U. of Toronto).


About the Speaker

Suzanne Stevenson received the B.A. degree in Computer Science and Linguistics from the College of William and Mary, in 1981, and the M.S. and Ph.D. degrees in Computer Science from the University of Maryland, College Park, in 1987 and 1994, respectively. From 1994-2000, she was an Assistant Professor at Rutgers University, holding joint appointments in the Department of Computer Science and in the Rutgers Center for Cognitive Science (RuCCS). Since July 2000, she has been on the faculty in the Department of Computer Science at the University of Toronto, where she holds the position of Associate Professor.

Professor Stevenson's primary area of research is computational linguistics, specifically looking at problems of combining linguistic and statistical approaches in natural language parsing, ambiguity resolution, and automatic acquisition of grammatical knowledge from large text corpora. In July 2000, she received an NSERC University Faculty Award for her research on these topics. In 1997, she received the NSF CAREER award for her work in computational modeling of ambiguity resolution in natural language understanding.