user warning: Table './johnandcailincmsdb/node_counter' is marked as crashed and should be repaired query: SELECT totalcount, daycount, timestamp FROM node_counter WHERE nid = 1667 in /var/www/drupal/includes/ on line 172.

exploring apache log files using hive and hadoop

if you're exploring hive as at technology, and are looking to move beyond "hello, world", here's a little recipe for a simple but satisfying first task using hive and hadoop. we'll work through setting up a clustered installation of hive and hadoop, and then import an apache log file and query it using hive's SQL-like language.

unless you happen to have three physical linux servers at your disposal, you may want to create your base debian linux servers using a virtualization technology such as xen. for a good guide on setting up xen, go here. for the remainder of this tutorial, i'll assume that you have three debian (lenny) servers at your disposal.

let's get started

syndicate content