“Until recently, I was at Microsoft Live Labs, working with big data and treating web search and advertising as a recommender problem. From 2004 to 2007, I founded and ran Findory.com, a startup that sought to personalize the flow of information. From 1997 to 2002, I was at Amazon.com where I wrote the recommendation engine used by Amazon.com and later led the software team that developed Amazon’s personalization systems. I have Masters’ degrees in business and computer science from Stanford GSB and University of Washington.”
There was this question a while back in StackOverflow:
I’m in the final year of computer science and engineering. I’m looking for ideas about a year long project,which is of great importance in my curriculum. So, I’m looking for a topic,that is not very simple but which can be implemented within a year.
My interests are
- Data Mining
- Distributed Computing
- Data Structures and Algorithms
- Neural Networks
Also, I’m open to suggestions regarding the same for any other field that could be of value to me, as I’m interested in higher studies post my bachelor’s degree.
I want to improve my coding skills , so I’m interested in doing the implementation in C++/Java.
Please help me with some advice.”
And I answered with this suggestion:
“sounds like a KDD project for me 🙂 since the NetFlix winner’s algorithms are not out, you might want to implement the blended algorithms outlined in the Progress Prizes:
“As in some of Yehuda’s past work, he combines two models, one a latent factor model, the other an item-item approach. The models yielded ‘the best results published so far’ on the Netflix data set by allowing them to represent temporal effects such as finding stronger relationships between items related in a short timeframe, handling that people tend to give higher ratings to older movies (if they bother to rate them at all), allowing for people to shift to giving higher or lower ratings on average over time, and capturing that people tend to use the same rating for multiple items rated in a short timeframe.
The paper is full of other cute tidbits too, like that they tried to detect day of the week effects — do people rate lower on Mondays? — but could not. They also discovered an unusual jump in the average rating in the data in 2004, which they hypothesize was due to features launched on the Netflix.com site that started showing people more movies they liked. Definitely worth a read.”
What’s my take on this – NetFlix has given us a gift with the publication of their blended models – whoever can make sense of those models and put it in a black box and offer it to the world is going to be a leader in the Post-Search Web.