Resnik and Yarowsky (1997) made a set of observations about the state-of-the-art in automatic
word sense disambiguation and, motivated by those observations, offered several specific
proposals regarding improved evaluation criteria, common training and testing resources,
and the definition of sense inventories. Subsequent discussion of those proposals resulted
in SENSEVAL, the first evaluation exercise for word sense disambiguation (Kilgarriff and
Palmer 2000). This article is a revised and extended version of our 1997 workshop paper,
reviewing its observations and proposals and discussing them in light of the SENSEVAL exercise.
It also includes a new in-depth empirical study of translingually-based sense inventories
and distance measures, using statistics collected from native-speaker annotations of 222
polysemous contexts across 12 languages. These data show that monolingual sense distinctions
at most levels of granularity can be effectively captured by translations into some set of second
languages, especially as language family distance increases. In addition, the probability that
a given sense pair will tend to lexicalize differently across languages is shown to correlate
with semantic salience and sense granularity; sense hierarchies automatically generated from
such distance matrices yield results remarkably similar to those created by professional
monolingual lexicographers.