tech

log4drupal now available on github

both the 5.x and 6.x versions are now available for download on github. sorry, i just can't do CVS anymore. to download:

  1. start by going here: http://github.com/cailinanne/log4drupal
  2. then click the all tags drop-down and choose the appropriate version
  3. then click the download button

a full description of the module is available here

breadth-first graph search using an iterative map-reduce algorithm

i've noticed two trending topics in the tech world today: social graph manipulation and map-reduce algorithms. in the last blog, i gave a quickie guide to setting up hadoop, an open-source map-reduce implementation and an example of how to use hive - a sql-like database layer on top of that. while this is one reasonable use of map-reduce, this time we'll explore it's more algorithmic uses, while taking a glimpse at both of these trendy topics!

san francisco nosql meetup

last week, i had the good fortune to attend the san francisco nosql meetup organized by johan oskarsson.

the meetup discussed the limitations of traditional relational database technology at scale and the open-source alternatives currently available with similar functionality to amazon's dynamo google's bigtable.

exploring apache log files using hive and hadoop

if you're exploring hive as at technology, and are looking to move beyond "hello, world", here's a little recipe for a simple but satisfying first task using hive and hadoop. we'll work through setting up a clustered installation of hive and hadoop, and then import an apache log file and query it using hive's SQL-like language.

unless you happen to have three physical linux servers at your disposal, you may want to create your base debian linux servers using a virtualization technology such as xen. for a good guide on setting up xen, go here. for the remainder of this tutorial, i'll assume that you have three debian (lenny) servers at your disposal.

let's get started

log4drupal - an updated logging api for drupal 6

drupal 6 included an upgrade to the built in logging functionality (watchdog). drupal 6 exposes a new hook, hook_watchdog which modules may implement to log Drupal events to custom destinations. it also includes two implementations, the dblog module which logs to the watchdog table, and the syslog module which logs to syslog.

with these upgrades, log4drupal is less critical addition to a drupal install, and i hesitated before providing a drupal 6 upgrade. however, eventually i decided that log4drupal is still a useful addition to a drupal development environment as log4drupal provides the following features still not provided by the upgraded drupal 6 watchdog implementation :

  • a java-style stacktrace including file and line numbers, showing the path of execution
  • automatic recursive printing of all variables passed to the log methods
  • ability to change the logging level on the fly

in addition, the drupal 6 version of log4drupal includes the following upgrades from the drupal 5 version

  • all messages sent to the watchdog method are also output via log4drupal
  • severity levels have been expanded to confirm to RFC 3164
  • log module now loaded during the drupal bootstrap phase so that messages may be added within hook_boot implementations.

you may download the drupal 6 version here. see below for general information on what this module is about and how it works.

easy-peasy-lemon-squeezy drupal 6 installation on debian linux

installing drupal is pretty easy, but it's even easier if you have a step by step guide. i've written one that will produce a basic working configuration with drupal6 on debian lenny with php5, mysql5 and apache2.

all commands that follow assume that you are the root user.

let's get started!

smartly purge your old backup files on linux

if you backup your *nix box, eventually you'll get into the business of purging your old backup files to preserve disk space. a reasonable way to do this is to use the find command to identify old backups and delete them. you should, however, consider doing something a little smarter than this.

using google analytics advanced segments to separate direct and organic traffic

traffic to a website can be divided into four major sources : direct, paid, organic and referrals. unsurprisingly, google analytics segments the traffic sources reports accordingly.

there is, however, a small catch. the ever growing popularity of search engines has led to an odd use case : users who use a search engine to search for exactly your domain name, instead of simply typing www.mydomain.com into their web browser. these users have just reached your site via an "organic search" and google analytics will classify them accordingly.

technically this is correct, but semantically it's troubling. the users who have reached your site by typing "mydomain" into Google have far more in common with the users that entered www.mydomain.com into their URL bar and far less in common with those users that reached your site by typing "my optimized search term" into Google. and the population of these users is not small - on one of the commercial drupal sites that i maintain these "mydomain" Google searchers account for over one third of the supposedly organic traffic.

before the release of google analytics advanced segments, one could estimate the volume of "True Organic" pageviews by starting with the organic search volume, then using the keyword report to subtract all the "mydomain" keywords (mydomain, mydomain.com, and, my personal favorite www.mydomain.com).

thankfully, advanced segments now gives us an easy way to create a "True Direct" and "True Organic" segment - in which all the "mydomain" organic searches have been removed from the organic segment, and stuck in the direct segment instead.

stokereport.com : drupal powered web 2.0 site for surfers

recently launched, stokereport.com is starting to make waves in the san francisco surfing community, as the first san francisco surf report website powered by user-generated content

powered by drupal 5.3 under the hood, stokereport is web 2.0 to the core. all content is user-generated, and users may submit reports via SMS, Twitter, mobile web or a traditional web browser. users may post pictures with their report, and vote for their favourites. this feature that has quickly led to a great collection of san francisco surf pics

stokereport is also a bit of a "mash-up" - combining data from the national weather service, weather underground, noaa and other regional weather services to provide current and forecast conditions for swell, wind and temperature.

and finally, if you can't quite get motivated to get in the water yourself, but still like to dream, check out stokereport's user-submitted "rants" - a great collection of news, videos and offbeat fun from the world of surfing.

what is twitter and why should you care

an unfathomable number of people around the world are hooked on a new(ish) service called twitter, and an equally unfathomable number still have no idea what it is. twitter is . . . a bit hard to explain. one way to think of twitter is as a blog hosting website. however, there is one twist : each entry (called a "tweet") may be no longer than 140 characters. and two unique features, 1) you can send in your blog updates by sending a text message (SMS) to twitter, and 2) your friends can sign up to receive your blog updates on their phones.

to get a better idea what i'm talking about, check out the twitter feed for my crazy husband, or an important person like barack obama. or, check out a few of the many twitter visualizations. twittervision shows some very small percentage of all the tweets received, and where they are coming from. it's best to look at this at an hour of the day when asia is asleep. i also like twistori which shows all incoming tweets containing certain keywords like "wish".

tweets started out in plain text, but it didn't take long for folks to think . . . gosh . .. i'd love to include a snapshot with my random thought of the day . . .and hence was born twitpic. and, of course, there are lots of handy applications to send photo-enabled twitters from your cellphone. i like twitterlator.

how you might use twitter depends on who you are. if you are . . .

  • incapable of sending an SMS, or don't know what that is :
    forget it, stay away.

  • capable of sending an SMS, but too lazy to setup a blog
    twitter's a great way to join the nation's new favorite pastime - generating as much useless information as quickly as possible.

  • a non-technical blogger
    twitter is a great companion to a traditional blog. if you're blogging using a standard blogging technology (wordpress, blogger, etc.) then you can easily add your twitter micro-blog as a sidebar to your regular blog. it's easy, it's fun, and it keeps your blog "fresh" with little effort on your part.

  • a geek
    if you have any geeky tendencies, you'll likely rapidly develop a love/hate relationship with twitter. love the platform, hate the implementation. at the very least, you'll capture your tweets and display them (sensibly!) as part of your blog. or, hell, you might write an entire surf conditions report website that uses twitter as its underlying technology.
syndicate content