Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Background Motivation Xiang Wang Xiaoming Jin Our Results Experiments Intelligent Data Engineering Group, School of Software Tsinghua University Summary Q&A 17th International Conference on Database and Expert Systems Applications(DEXA ’06) Outline Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A 1 Background Outline Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A 1 Background 2 Motivation Outline Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing 1 Background 2 Motivation 3 Our Results Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Outline Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing 1 Background 2 Motivation 3 Our Results 4 Experiments Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Outline Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing 1 Background 2 Motivation 3 Our Results 4 Experiments 5 Summary Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Outline Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing 1 Background 2 Motivation 3 Our Results 4 Experiments 5 Summary 6 Q&A Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Vector Space Model for Text Retrieval Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Vector Space Model for Text Retrieval Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Vector Space Model for Text Retrieval Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A The Problem of VSM Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A The Problem of VSM Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A The dimension of the vector space, which is the number of terms in all document, can be very high in practice, in thousands namely. The Problem of VSM Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A The dimension of the vector space, which is the number of terms in all document, can be very high in practice, in thousands namely. The effectiveness and efficiency of text retrieval based on VSM suffers from the curse of dimensionality. The Problem of VSM Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A The dimension of the vector space, which is the number of terms in all document, can be very high in practice, in thousands namely. The effectiveness and efficiency of text retrieval based on VSM suffers from the curse of dimensionality. Latent Semantic Indexing(LSI) was proposed to solve the problem of high dimensionality of VSM. Latent Semantic Indexing Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A 1. Given a term-document matrix A, perform Singular Value Decomposition(SVD) on A: A = U ΣV T . 2. Reduce the dimension of A to k: Ak = Uk Σk VkT . 3. Project original document vectors to a lower-dimensional subspace: d¯ = UkT d The Folding-in Method Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A The computational complexity of LSI is high, mainly due to the SVD process. The folding-in method is used as an approximation to LSI. Instead of performing SVD on A, it performs SVD on A1 , which is a sample of the columns of A. A1 is sometimes called the training set. Pros and Cons of the Folding-in Method Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Pros and Cons of the Folding-in Method Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Pros Maybe the best choice without any prior knowledge. Easy to implement with low computational complexity. Pros and Cons of the Folding-in Method Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Pros Maybe the best choice without any prior knowledge. Easy to implement with low computational complexity. Background Motivation Our Results Experiments Summary Q&A Cons The effectiveness of the folding-in method relies on proper sampling. There is no explicit way to justify the effectiveness of a selected training set. Our Contributions Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Our Contributions Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Understanding the Folding-in Method We illustrated from the linear algebra point of view that the essential of the folding-in method is a subspace tracking process with partial information. Our Contributions Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Understanding the Folding-in Method We illustrated from the linear algebra point of view that the essential of the folding-in method is a subspace tracking process with partial information. Motivation Our Results Experiments Summary Q&A Enhancing the Folding-in Method We proposed a novel training set selection strategy, which is deterministic and more effective. The Folding-in Method as a Subspace Tracking Process Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Ak = Uk Σk VkT , where UkT Uk = I. Denote the range space of Ak to be Sk , then Uk UkT is an orthogonal projection from Rm onto Sk . Sk is called the semantic subspace, which is considered to represent the latent semantic structure of original document vectors. The Folding-in Method as a Subspace Tracking Process, cont. Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A The projection of d in original vector space S onto the ¯ lower dimensional subspace Sk is d. ¯ 2 /kdk2 equals to the cosine value between d and w = kdk subspace Sk . Larger w implies that d is closer to the semantic subspace we pursue. A Novel Training Set Selection Strategy Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Principle Those document vectors which are closest to the target semantic subspace will be chosen as the training set. Algorithm Input: A, k, n1 Output: A1 1. Find Uk for A. 2. Compute wi = kvi k2 for all 1 ≤ i ≤ n. 3. The first n1 documents with largest wi are selected as the columns A1 , which is the training set. Implementation of Our Algorithm Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Problems The strength of our method comes from the utilization of the latent information contained in the semantic subspace during the training set selection process. However, as we have mentioned before, it is impractical to compute the semantic subspace over very large document collection, and that is exactly the reason why the folding-in method is adopted. Our Results Experiments Summary Q&A Solutions Instead of computing the semantic subspace for all documents, we perform the training set selection process on different subsets of original document collection. Further selection can be performed on the preliminary results. Data Preparation Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Identifier MED CISI NPL Documents 1033 1460 11429 Terms 5735 5544 7536 Queries 30 35 93 Table: Corpora used in the experiments Experiment Settings Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Similarity search was performed on each data set. Xiang Wang, Xiaoming Jin Average precision was used as evaluation metric. Background Motivation Our Results Experiments Summary Q&A The results of LSI were used as ground truth. The competitor is random sampling: 100 different randomly selected samples and their best and average performances were recorded as Rand-best and Rand-avg respectively. Experimental Results Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Figure: Average precision with respect to LSI over MED and CISI collection Experimental Results, cont. Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Figure: Average precision with respect to LSI over NPL collection Experimental Results, cont. Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Figure: Retrieval performance of gradual method over NPL collection Summary Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A Summary Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A We theoretically justified the effectiveness of the folding-in method from a linear algebra point of view. Summary Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A We theoretically justified the effectiveness of the folding-in method from a linear algebra point of view. A novel training set selection strategy was proposed in a greedy style. Summary Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A We theoretically justified the effectiveness of the folding-in method from a linear algebra point of view. A novel training set selection strategy was proposed in a greedy style. The idea of incremental subspace tracking can be further developed. Thank You Understanding and Enhancing the Folding-in Method in Latent Semantic Indexing Xiang Wang, Xiaoming Jin Background Motivation Our Results Experiments Summary Q&A
© Copyright 2025