Home page for John J. Tanner
jjtanner (at) cs.ucf.edu
This is the home page for John J. Tanner.
The paper on ontological selectional preferences for non-pertainym
adjectives presented at AAAI-10 is here.
Here are some supplemental notes to the paper.
- We found that the paired t-test for the results between the Normalized LKM and the Lemmata Fraction method for the third band had a p-value of 1.837e-09, still negligible.
- The only bigrams analyzed were those beginning with a lemma that could be an adjective in WordNet, and a lemma that could be a noun. This is true especially in calculating Fr(a * __), as in the below.
- In the Scaled Lemmata Fraction Method, the threshold function k(b), and the expression k(b(a)) were awkwardly presented. It would be better to say k(a), a being the adjective currently analyzed, where k(a) = q(b(a)) * N/ Fr(a * __)
- In all of the computational methods that we tried, we used only the vetted lemmata based on WordNet 2.1, even the LKM-based methods.
- Any references to WordNet 2.0 are typos: this should be WordNet 2.1.
- We chose an unusual method for counting the number of classes a lemma belongs to. In order to penalize highly polysemous lemmata, we counted the number of classes for each sense separately, and summed these counts together. After the paper was publlished, we tried a more usual approach: counting the number of distinct classes of a lemma once, no matter how many different senses they correspond to. This improved the scores for LKM around a point in the first frequency band and by 3 to 4 points in the lower two frequency bands, but LKM's performance was so poor that even this did not help much compared to the other methods. There was an improvement of just over one percentage point for normalized LKM on the first frequency band, yet there is still a confortable performance gap between it and the Lemmata Fraction method in that band. Also, in the second frequency band, there was little difference between the normalized LKM method and the lemmata fraction method. Recall that in the original experiment, there was only a small difference which was barely significant. The scores for normalized LKM on the second and third frequency bands showed only minor differences.
- We detected an error in the software. When calculating the LKM methods, the software did not screen out potential plural nouns. When calculating the denominators for the lemmata fraction methods, it did not screen for plural nouns or NLTK stopwords. Testing reveled this error to have introduced only minor differences to the test results.
- Finally, the bibliograpy mentions a paper by Rooth et. al. published in 1999. This was published in ACL-99, not ACL-09 as the bibliography mistakenly claims.
The database of selectional preferences, as well as a utility to
demonstrate their power for aiding Noun Sense Disambiguation (NSD) will be
added in the near future.
2010 July 29: We have experienced some difficulties. Please bear with
us. Thank you.
2010 Sept. 7: After further testing and verification, we present the
database of preferences, first as a zipped directory of files for each
adjective. Each file is comprised of a series of lines, on each of which is the name
of a WordNet category (in category_name.n.01 format, 01 representing a
sense number), followed by a tab and then a preference score. Each file is
sorted by preference score in increasing order.
Adjective-Noun Preferences V.
1.0
2012 July 30: The online demonstration is now available.