Latent Senmantic Indexing/Analysis
LSI (Latent Semantic Indexing) PDF Print E-mail

 

Latent Senmantic Indexing/Analysis

All search engines use a technique to index pages. The technique used is known as Latent Semantic Indexing (LSI). This technique calculates the single value decomposition of every word on a page with respect to the term entered. So when a search term is typed, LSI finds out near matches particularly when exact matches cannot be found. This enables search engines to display exhaustive results. 

LSI gives value to related words in a document or page. This makes it easier to come out with detailed search results. Spam sites, for instance, are stuffed with related words just to manipulate the crawlers. Interestingly, LSI can sense such intentions and reduces the value of related words in a spam site. Likewise, pages which focus strictly on one keyword or phrase, find it difficult to rank in search engines which use LSI technique. Web pages with a variety of keywords, key phrases, tenses of keywords all through the content can easily achieve a good search engine ranking.

LSI work with the help of Synonymy and Polysemy. Expressing the same idea with the help of different words is called synonymy. The phenomenon of a word having multiple meanings is known as Polysemy. With the help of these two, a search engine may fail to retrieve a relevant document, that does not contain the words which appeared in the query (a search for "doctors " may not bring the document containing the word "physicians") and at the same time may retrieve irrelevant documents having desired words in the wrong meaning (like a botanist and a computer scientist looking for the word "tree").


LSI tries to comprehend keywords in relation to all the words in the page. With the help of statistical analysis, it can decipher that documents have words which have been used in the same context. For example, "apple" and "computer" will also have "Mac OS" and are therefore also relevant. The approach is more like understanding the intent of the user query and retrieving information based on the user’s search.
Last Updated on Thursday, 27 August 2009 15:38
 

Who's Online

We have 5 guests online