State of the Netflix Prize

Kudos to Gravity for their recent advancement. It's nice to see them challenging BellKor for the top spot. Or at least for the combined Gravity/Dinosaur-fossil team to do so.

My week-long blending experiment ended short of my 5th place goal, reaching only 8.24%. I'm continuing again to add new result sets and prune weaker ones. I'm using something akin to a genetic algorithm to choose which new result sets to generate: I look at which result sets are pronounced most strongly right now and generate new result sets with similar (but different) parameters from those, while cutting out any result sets beneath a specific threshold. I don't even care any more about whether the individual results get good scores; it's just not a very good predictor of how well it will do in the blend.

The Slow Climb

This week, I've put together a package of 100 result sets (I actually didn't mean for it to be a round number... it just turned out that way) to do a long generation process on. I've cranked up all the slowness parameters on my blender, and I plan to just run it all week, submitting partial results each day as it fine-tunes the results. My 8.16% submission two days ago was with a fast and conservative run of the blender over one night; the gap between my probe results and the qualifying results was a healthy 0.54% (meaning that my probe score was 7.62%). Yesterday's submission was an overnight run with everything cranked way down slow and optimistic (a.k.a. overfitting), using Bootstrap Aggregation (thanks to Todd Lipcon for recommending this to me a few weeks ago) to reduce overfitting error. I scored better, with an 8.21% score on the qualifying set, but my gap was down to 0.23% (meaning that my probe score was 7.98%). I'm looking forward to my submission tomorrow morning, where I'll see firsthand whether Bootstrap Aggregation works as I think it does (I don't have a formal training in Statistics). I'm hoping that my gap will increase, but I don't know by how much. I don't think it should get up to 0.54% as I have it set, but it might get up to 0.3% or so, which would be nice enough. (update: it didn't increase at all... back to the drawing board)

I have a curious note to share about my blending program's performance. I am using two systems with identical Intel Core 2 Quad Q6600 CPUs and PC6400 RAM in dual-channel mode. Both are running Ubuntu, but only one is running a windowing environment (GNOME). The other has the window system disabled and is running only in text mode. Here's the problem: My blender runs through a bootstrap sample once every three hours on the computer without a window environment and every six hours (almost half as fast) on the one running GNOME. They both are using the same amount of CPU time, and both are showing in top as using 397%+ CPU time consistently (though the one without a windowing environment is pegged at 400% most of the time). So why the difference in performance? They both have the same parameters set, and they're both running from the same directory, so they have identical program code - and if you think network might be the issue, the slower one is the one that has all files hosted on itself. The most-often accessed data for the blending program is about 100K in size. I believe what's happening is that the other programs being activated continuously in X windows are flushing the CPU cache (haven't figured whether it's the L1 or L2 cache that's primarily involved), causing this data to be continuously reloaded; on the machine running only a command-line interface, the cache is undisturbed and cache misses are much rarer.

Busy Busy

I've been very busy over the last few days with a variety of things.

Of course, I was happy to move up to #6 in the Netflix Prize. I'm still working at that, though I've mostly just had time to manage computer time and not really game out new methods.

I'm also starting to shift my role toward being more interested in helping other people out. I've started writing the first part of what I plan to be a Netflix Prize "jumpstart" series, to help get people into some basic methods for dealing with the Netflix Prize. I keep thinking that the conventional wisdom is to make people work harder for it, and that's probably good and true to some extent, but I never saw anybody complain about Brandyn Webb's posts, which provided a more hands-on guide to follow. Many people instead have used his work as a framework to get started. Also, there are a number of excellent papers floating around on different methods, but many of them are generalized (as good scientific papers should be) and thus are relatively inaccessible to many contest participants that may otherwise be able to provide great contributions.

I'm thinking of a four-part series:

1) Data structure and organization
2) Non-negative matrix factorization
3) Directed neural net with back-propagation for rating predictions
4) Directed neural net for result blending

I'm thinking that my goal will be to provide enough accessible fundamentals for people to get started on these while leaving more elaborate tweaks and extensions to the reader.

Sharing

Along with the excitement of climbing the Netflix Prize leaderboard come some other issues to ponder.

Firstly, I need a job. If you know of anyone looking for me, please feel free to let them or me know. (d a n [at] t i l l b e r g [dot] u s)

Secondly, to every person or team that I push down as I climb up: It's just unavoidable. I'm sorry.

Finally... When I get close to as high as I think I can get for a while (I'm not quite there yet), should I publish my methods? Source code?

I think that this contest is winnable. I'm not sure that this contest is winnable by me. I am positive that this contest is not winnable by me based on the methods I have right now. I don't actually think that I am using any methods which at least somebody else has thought of already. I did come up with a fair amount of it on my own, though, so I at least have something to share (for a funny example, I had a eureka moment when I thought of putting the sum of squares of all elements from my SVD matrices in the minimization function, only to later discover that that's what everyone already did).

I do think it would be interesting to have a betting market where people could buy and sell shares of Netflix Prize teams based on their estimated chances of each team winning. I'm sure BellKor would sell well, having held the lead for a long time. I can't imagine the Dan Tillberg valuation would be high. However, I do think that there is value for me in this competition outside the million dollar prize. Actually, I've never figured that any participant's mean expected return would be any portion of the million dollar prize at all.

There is value in the educational component; At the start of November, I had no experience with neural networks. I also gained some experience with nVidia CUDA, Intel Compiler SSE optimizations, and OpenMP. I've learned that 6GB RAM is not enough (on the other hand, a terabyte of hard disk space is enough, for now... my data files weigh in around 300GB, though most of that is retired). And I've honed my programming skills in many ways.

There is also value in the networking component. I've met a number of interesting folks in the field, both spectators and participants. My website gets hits from all sorts of curious domains. I think that if I were to at some point post comprehensive tutorials on the methods I've used to get where I am, the idea flow would increase a lot. It would give my website a purpose. It might even raise my hosting bill.

In any case, I'm currently leaning toward publishing my methods in the future. I don't think I'll publish source code or anything that makes it too easy for someone to compile, run, and pollute the leaderboard. I would like to hear any thoughts on this, though. (d a n [at] t i l l b e r g [dot] u s) Thanks!