Lucene has got trec data set to measure relevance of search engine. When you are trying to figure out relevance of search engine using either ndcg or mrr methods, you need to get a golden ranking order.
Big companies with financial resources get the golden ranking order for search queries using crowd sourcing methods such as crowdflower or amazon trunk. How should a startup with minimal resources can get the golden ranking order. One way to get golden ranking order is to use user clicks. If you remove position bias and compute the click through for search results you can potentially get golden ordering by sorting the results by click throughs. How do you remove position bais ?. You can potentially aggregate the click troughs at each of the positions in search result for a search query by a category or grouping and use it as a heuristic for position bias.
Another question, still pops ?. If you are using user clicks, don't the users always click results you show them ?. How do you evaluate relevance for results which you don't show ?. A potential solution can be explore/exploit strategy. If we make a assumption that our top 10 documents are going to be there in top 20-30 results, we can randomly show different users different 10 search results and learn golden ranking from the results.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment