i've noticed two trending topics in the tech world today: social graph manipulation and map-reduce algorithms. in the last blog, i gave a quickie guide to setting up hadoop, an open-source map-reduce implementation and an example of how to use hive - a sql-like database layer on top of that. while this is one reasonable use of map-reduce, this time we'll explore it's more algorithmic uses, while taking a glimpse at both of these trendy topics!
if you're exploring hive as at technology, and are looking to move beyond "hello, world", here's a little recipe for a simple but satisfying first task using hive and hadoop. we'll work through setting up a clustered installation of hive and hadoop, and then import an apache log file and query it using hive's SQL-like language.
unless you happen to have three physical linux servers at your disposal, you may want to create your base debian linux servers using a virtualization technology such as xen. for a good guide on setting up xen, go here. for the remainder of this tutorial, i'll assume that you have three debian (lenny) servers at your disposal.
let's get started