Sunday, December 15, 2013
Mathematics of Decision Making
Did you have to choose one decision out of several decisions ?. Through out our life, we make decisions. We need to find a spouse and have to choose one of the n potential candidates. We need to hire a programmer in the team, and we need to interview m people and find one candidate. Did you explore a career opporutnity recently ?. You have to choose one job, after interviewing m companies and make a stopping decision
Do you see the pattern ?. I was researching around to see, if there is any optimal number of decision, we should be looking around before finalizing on one decision. I stumbled around the secretary problem http://en.wikipedia.org/wiki/Secretary_problem which suggests that optimal cutoff for further exploration is n/e and we are guaranteed to make the best decision with a probability of 1/e.
It is very interesting to see how the random processes can actually help us in guiding us when to stop looking for more options and narrowing down on one.
Saturday, March 2, 2013
Crowd Sourcing or User Sourcing
Lucene has got trec data set to measure relevance of search engine. When you are trying to figure out relevance of search engine using either ndcg or mrr methods, you need to get a golden ranking order.
Big companies with financial resources get the golden ranking order for search queries using crowd sourcing methods such as crowdflower or amazon trunk. How should a startup with minimal resources can get the golden ranking order. One way to get golden ranking order is to use user clicks. If you remove position bias and compute the click through for search results you can potentially get golden ordering by sorting the results by click throughs. How do you remove position bais ?. You can potentially aggregate the click troughs at each of the positions in search result for a search query by a category or grouping and use it as a heuristic for position bias.
Another question, still pops ?. If you are using user clicks, don't the users always click results you show them ?. How do you evaluate relevance for results which you don't show ?. A potential solution can be explore/exploit strategy. If we make a assumption that our top 10 documents are going to be there in top 20-30 results, we can randomly show different users different 10 search results and learn golden ranking from the results.
Subscribe to:
Posts (Atom)