Time to Rejoin the Netflix Prize Party
Friday, May 30, 2008 | by Dan TillbergIt's been four months since I put any serious effort into the Netflix Prize competition. But now that I'm unemployed, it's time to start the fire back up!
I hope to tackle two projects in the next couple weeks:
Firstly, though I missed the deadline for the Netflix-KDD workshop, I'd like to finish the Jumpstart Series which I started long ago (and which itself needs a bit of a jump start).
Secondly, it's time to harness the power of these Playstation 3's sitting next to my computer. I set them up long ago and have spent a fair amount of time learning about the architecture, but I have yet to actually write any applications for them.
The key behind the PS3s are their IBM-made Cell processors. The Cell is a hybrid 10-core processor, split between 2 general purpose cores (similar to PowerPC cores) and 8 streaming processing cores (two of these are unavailable in the playstation, making for only 6 streaming cores). The two general purpose cores access memory through an automatic cache (like any desktop processor). The streaming processors access memory through DMA calls and each has 256KB local memory (and 128x128-bit registers); this means that pre-fetching is all done manually, so that if written well, a program can really use the full processing power of each streaming processor. Beyond that, they're just the 128-bit SIMD cores you would expect. Each can execute SIMD instructions at 3GHz (IBM quotes 32Gflops each, but this is quite misleading, even though it has been achieved in real use - it comes from (the lab-achieved 4Ghz) * (128/32 = 4 flops per instruction) * (2 ops per cycle for floating multiply+add instructions)).
The Cell sits sort of halfway between desktop CPUs and GPUs (graphics processing units, i.e. high-end video cards) in terms of programming difficulty and also in performance vs. specificity. GPUs are often better-suited (and faster per dollar) for certain computational tasks, but they suffer some drawbacks that may be difficult or impossible to overcome, depending on the task at hand. For example, nVidia GPUs may only execute one program at a time throughout all of its cores, while the Cell cores may each run entirely independent programs (they can also work together, such as in a processing chain - memory transfers between local memories are very fast). This limitation of the GPUs is no problem if you can reduce/rewrite your problem into something akin to very large or simultaneous matrix operations. But if not, it can be difficult to harness the full power of the GPU.
I've found feed-forward neural nets particularly difficult to accelerate using GPUs, but I think that the Cell should be perfect for the task.
The real challenge, as I mentioned just after the New Year, is that the PS3 has only about 200MB of RAM (!!!) available to the user. As many readers might note, this is just too small for any reasonable scheme to squeeze the Netflix Prize data, which typically compacts to about 300MB reasonably before methods of squeezing it further become tricky. My solution to this, depending on how things work out (especially in terms of how much memory is used by the OS, app code, libraries, networking, etc), will be one of: a) Split the data between two PS3s, at 150GB training data each, or b) get one or more additional PS3s and split the data up more.
Many training algorithms I've used thus far should adapt easily to working in a cluster by just regularly "averaging" the weights/parameters between all the active nodes continuously.
Anyway, time to actually get to work...
Moving to Providence
Thursday, May 15, 2008 | by Dan TillbergI think it might be a little odd for a journal entry planning a move to NYC be followed by a journal entry making a move to Providence, but that's what I'm doing.
I don't know as why that matters to most visitors here, though, as a) visitors to this site are, on average(mean), probably about 3000-4000 miles away, and b) Providence is only 30-40 miles away from Needham, where I have been living. For a few readers, it may be interesting news, though.
Also, I'm still hoping to write up something for the Netflix-KDD Workshop. We'll see about that deadline, though...
Update
Thursday, February 21, 2008 | by Dan TillbergAlas, a few weeks have passed since my last update. Between illness and job interviews and figuring out whether I could/should move to New York City...
Thanks to everybody that contacted me about possible employment opportunities!
I'm joining the team at Amie Street. Amie Street is a start-up based in Long Island City (part of Queens in New York City). They've pioneered a system of quasi-market-based pricing on digital music tracks (sans DRM). Each track starts out free and, depending on demand, can increase up to a maximum price of $0.98. Additionally, users are encouraged (by bonus credit to purchase additional music) to "REC" newer tracks that they feel are destined for greatness, as measured by the tracks' future prices.
And for the record, I got connected with Amie Street through my participation in the Netflix Prize contest. As I was previously rather poorly-credentialed, the Netflix Prize has served as an excellent environment for me to exhibit my skills. This, despite the allure of the million-dollar prize (which is oddly closer than I ever figured it would be), was always my aim in participating in this contest. Thus, thank you Netflix for the excellently-defined contest!
Deep Sea Fiber Optic Cables
Monday, February 4, 2008 | by Dan TillbergThere have been a few (three?) incidents of deep sea fiber optic cables being severed in the last week. The New York Times ran an article on the first two: http://www.nytimes.com/2008/01/31/business/worldbusiness/31cable.html.
While it could be said that many people are trying to discern why these might happen so close together in time, I'm not sure there's much reason to throw on the tin foil hat for these events. The affected nations aren't pointing fingers of blame, as far as I know, and some reasonably good explanations are bound to come along. Harder still is the question of who would benefit from the rather minor interruptions the cuts caused.
But it does raise an issue: In case of the next time some idiots get the bright idea to start a major war between powerful nations, will the Internet infrastructure we rely on nowadays go bye-bye? We've recently seen China's interest in destroying satellites (think GPS, think non-cabled communications). How crippling would this be to economic/commercial/industrial infrastructure if combined with the severing of fiber optic cables around the world? Obviously, militaries should already take this into account for war-planning. But businesses probably tend to be more happy-go-lucky in this regard.
The second thing is a question of the predictive nature of events like the loss of fiber optic connections. If some idiot wants to start a small war somewhere, there would most likely be certain preparations they would make beforehand. It seems that if you are going to invade your neighbor (say, for example, China wanted to "reassert its authority" over the renegade province of Taiwan), you might benefit by a mysterious sudden interruption in that nation's connection to the outside world and the temporary confusion it would cause in the opening salvos of the conflict.