tech

More Digg Technical Talks

We had another couple of great external speakers talking at Digg HQ. The open-source master Doug Cutting stopped by the Digg office to talk about the growth of the Hadoop platform. He talks in detail about Avro. Avro is an RPC and serialization framework that has some interesting differences compared to the popular Thrift framework.

tokenizing twitter posts in lucene

solr lucene is good technology to use for searching over a corpus of tweets. if you take the content of a tweet and dump that into the default solr lucene "text" field, you'll do pretty well. however, if you look at your results closely, you'll find one subtle, but very annoying problem: searches on a hashtag term will match the non-hashtag term.

Continuous Deployment at Digg.com

Digg's Andrew Bayer has just written a blog describing how we use Git, Hudson, Selenium, Puppet and Gerrit to manage continuous deployment at Digg.

Andrew describes how we get developer commits to production quickly and safely using a combination of automated packaging and staging, web based code review and automated testing (unit and selenium)

Read the full blog here.

Digg Technical Talks

We recently started inviting external speakers to talk at Digg HQ on a variety of Technical subjects. Jack Dorsey the founder of Twitter and Square stopped to talk about the growth of Twitter and the resulting engineering and cultural challenges. He talks about applying the lessons learned there to his latest venture, Square. Fascinating stuff.

Digg and Drupal

We've recently started using Drupal 6 at Digg.com for all our content needs. So far so good. Everything from our jobs page, to our site tour, to the Open Source site we launched three days ago, is managed via Drupal.

Right now we're looking for someone to lead the charge on our internal Drupal development and our contributions back to the project. If you're handy with Drupal and have a passion for open source development, take a look at our latest posting on the jobs page Read more about our use of Drupal on the Digg Blog.

Dealing with the Data Deluge

I gave a keynote presentation at Under the Radar's "Commercializing the Cloud" conference, It's about NoSQL and picking distributed database technologies that might help you with the "deluge of data" that the world is experiencing. I've included the presentation below.

A Geometric Progression of Effectiveness - The Agility of Interruptions

André Maurois (1885-1967) wrote that "The effectiveness of work increases according to geometric progression if there are no interruptions." At Digg we struggle between the clear benefits of uninterrupted work and the need to be agile in our communication.

Saying Yes to NoSQL; Going Steady with Cassandra

The last six months have been exciting for Digg's engineering team. We're working on a soup-to-nuts rewrite. Not only are we rewriting all our application code, but we're also rolling out a new client and server architecture. And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP.

log4drupal now available on github

both the 5.x and 6.x versions are now available for download on github. sorry, i just can't do CVS anymore. to download:

  1. start by going here: http://github.com/cailinanne/log4drupal
  2. then click the all tags drop-down and choose the appropriate version
  3. then click the download button

a full description of the module is available here

breadth-first graph search using an iterative map-reduce algorithm

i've noticed two trending topics in the tech world today: social graph manipulation and map-reduce algorithms. in the last blog, i gave a quickie guide to setting up hadoop, an open-source map-reduce implementation and an example of how to use hive - a sql-like database layer on top of that. while this is one reasonable use of map-reduce, this time we'll explore it's more algorithmic uses, while taking a glimpse at both of these trendy topics!

syndicate content