Multi-Prototype Vector Space Models of Word Meaning _______________________________________________________________________________________________________________ Authors : Joseph Reisinger & Raymond J. Mooney REVIEW BY: NITISH GUPTA ROLL NUMBER : 10461 Introduction • Automatically judging the degree of semantic similarity between words is an important task. • It is useful in Text Classification, Information Retrieval, Textual Entailment and other language processing tasks. • The empirical approach to find semantic similarity between words uses the Distributional Hypothesis i.e. that similar words appear in similar contexts. • Traditionally word types are represented by a single “prototype” vector of contextual features derived from co-occurrence information. • The semantic similarity is measured using some measure of vector distance. Motivation • The traditional vector-space models represent a word with a single “prototype” vector which is independent of context, but the meaning of a word clearly depends on context. • A single vector space model is incapable of handling phenomena like Homonymy and Polysemy. This model is also incapable of handling the fact that the word meanings violate the Triangle Inequality when viewed at the level of word types. Eg. The word club is similar to both bat and association. But its similarity to the words bat and association clearly depends on the context the word club is used in. Methodology • The authors present a new vector-space model that represents a word’s meaning by a set of distinct “sense-specific” vectors. Therefore each word will be represented by multiple vectors each of which will be representing different context in which the word is used. • For each word, ‘w’ : Step 1: For each occurrence of the word ‘w’ a vector will be computed based on its context which is composed of a 10-word window about the word. Step 2: A set of ‘K’ clusters is formed using movMF model(mixture of von MisesFisher distributions) which models semantic relatedness using cosine similarity. A set of 𝝅𝒌 (𝒘) representing centroids of the ‘K’ clusters for each word ‘w’ is hence computed. • The clusters are not assumed to represent the different senses of the word rather the authors rely on clusters to capture meaningful variation in word usage. Methodology Image showing the methodology of obtaining clusters from different contextual appearances of the word ‘Position’. • The ‘black star’ shows the centroid of the vectors as would have been computed by a singlevector model. • The different clusters and colored stars show the different sensespecific prototype vectors pertaining to the different contexts in which the word ‘Position’ was used in the corpus. Measuring Semantic Similarity • Given two words w and w’ the authors define two noncontextual clustered similarity metrics to measure similarity of isolated words. where d(:, :) is the cosine similarity index. • In AvgSim, word similarity is computed as the average similarity of all pairs of prototype vectors of the words. Since all pair of prototypes of the words contribute in AvgSim, two words are judged similar if many of their senses are similar. • In MaxSim, similarity is measured as the maximum overall pairwise prototype similarities. Since only the closest pair of prototype contributes to the MaxSim, it judges the words as similar if only one of their senses is very close. Experimental Evaluation • The corpus used by the authors include: • A snapshot of Wikipedia taken on Sept. 29th, 2009, with Wikitext markup and articles with less than 100 words removed. • The third edition of English Gigaword Corpus, with articles containing less 100 words removed. Judging Semantic Similarity • For evaluation of various models firstly comparisons of lexical similarity measurements to human similarity judgments from the WordSim-353 dataset is done. • Spearman’s rank correlation (𝜌) with average human judgments was used to measure the quality of various models. For values of 𝑲 ∈ [𝟐, 𝟏𝟎] on Wikipedia and 𝑲 > 𝟒 on Gigawords Corpus the value of Spearman’s Correlation factor is in the range of 0.6 – 0.8. Predicting Near-Synonyms • Here multi-prototype model’s ability to determine the most closely related word to a target word is tested. The top ‘k’ most similar words were computed for each prototype of each target word. • For each prototype of each word a result from the multi-prototype vector model and one from a human is given to another human. The quality of measured from the fact that how frequently was the multi-prototype method chosen. • The results show that for homonymous words the system gives excellent results as compared to polysemous words, but for the right number of clusters the polysemous words also give good results. Thank You!! Questions
© Copyright 2025