john quinn's blog

welcome to john quinn's blog

you can read about some of the technical stuff i've been working on recently, including scalability, xen, drupal and linux.

why not subscribe to my technical blog or check out some of my photographs, or go away and check out ava's blog or surfing conditions at ocean beach in san francisco.

there's more about me on our about us page. don't hesitate to contact me about anything, or follow me on twitter



Saying Yes to NoSQL; Going Steady with Cassandra

The last six months have been exciting for Digg's engineering team. We're working on a soup-to-nuts rewrite. Not only are we rewriting all our application code, but we're also rolling out a new client and server architecture. And if that doesn't sound like a big enough challenge, we're replacing most of our infrastructure components and moving away from LAMP.

san francisco nosql meetup

last week, i had the good fortune to attend the san francisco nosql meetup organized by johan oskarsson.

the meetup discussed the limitations of traditional relational database technology at scale and the open-source alternatives currently available with similar functionality to amazon's dynamo google's bigtable.

smartly purge your old backup files on linux

if you backup your *nix box, eventually you'll get into the business of purging your old backup files to preserve disk space. a reasonable way to do this is to use the find command to identify old backups and delete them. you should, however, consider doing something a little smarter than this.

creating an agile engineering work space at digg

digg has been growing like crazy, creating space problems for our development department, resulting not only in excessive density, but in our team being spread across several floors. these space issues have caused significant challenges for our emerging agile development environment, stifling informal communication and making it difficult to organize teams into cohesive groups.

amazon release their elastic block store, ebs

a while ago i posted some performance benchmarks for drupal running on a variety of servers in amazon's elastic compute cloud.

amazon have just released ebs, the final piece of technology that makes their ec2 platform really viable for running lamp stacks stuck as drupal.

ebs, the "elastic block store", provides sophisticated storage for your database instance, with features including:

  • high io throughput
  • data replication
  • large storage capacity
  • hot backups using snapshots
  • instance type portability e.g. quickly swapping your database hardware for a bigger machine.

laying the agile groundwork at digg

i'm currently working on re-engineering many of the development processes at digg. we're adopting a number of practices from the agile world that complement the type of development that we do. these practices include: build automation, automated deployment, daily scrums, short releasable time-boxed iterations, simple design, refactoring, and just-in-time specification.

we've decided to be agile about our adoption of our new agile processes, introducing them incrementally, measuring the results along the way and iterating as necessary.

as we worked through the cascading dependencies of our adoption path, it quickly became clear that automated testing had to be a cornerstone practice if we were going to make a success of the others.

a new jmeter book from packt

recently i posted a couple of introductory articles on jmeter, a great apache open-source tool that allows you to measure the performance and scalability of a wide variety of services, especially web-applications.

i wrote these articles because although the online documentation provides reasonable reference material, it doesn't serve well as a jmeter introduction or tutorial.

things have changed a bit since then. the uk-based publishing house packt publishing were kind enough to send me a copy of emily halili's newly published book on jmeter, which is as far as i can tell, is the first book dedicated to the subject.

digg made me an iPerson

i started working at digg.com a few months ago. on my first day i was handed a brand new macbook pro. a reasonable person would have been delighted, but i was filled with dread. i had started a new high-pressure job, and the only tool at my disposal was a mac. not only that, i shuddered at the very thought of becoming an iPerson.

my most recent experience with apple had been extended subjection to a dog-slow powerpc running mac os 7. after which, i have always assumed macs to be fisher-price type devices, designed for those too misguided or incompetent to operate even a second-rate device like a wintel box.

lamp on amazon ec2 shaping up nicely

recently i posted some encouraging performance benchmarks for drupal running on a variety of servers in amazon's elastic compute cloud. while the performance was encouraging, the suitability of this environment for running lamp stacks was not. ec2 had some fundamental issues including a lack of static ip addresses and no viable persistent storage mechanism.

amazon are quickly rectifying these problems, and recently announced elasic ip addresses; a "static" ip address that you own and can dynamically point at any of your instances.

today amazon indicated that persistent storage will soon be available.

zicasso launches drupal-powered web2.0 travel site

three weeks ago, zicasso.com launched a drupal-powered free personalized online travel service that aims to connect travelers to a global network of quality, pre-screened travel companies. unlike many internet travel sites which provide cheap fares or packages, zicasso is targeted for busy, discerning travelers who want to plan and book complex trips (the ones with multiple destination stops or activities).

zicasso was favorably reviewed in popular web publications including; pc magazine, techcrunch, ars technica and the san jose business journal.

zicasso chose to build their application using the open-source cms system, drupal to leverage the wide array of web2.0 functionality provided by the open source community.

the application was rapidly constructed by a small development team led by cailin nelson and jenny dickinson. the team took advantage of "core" drupal modules including cck, panels, views, imagecache, workflow and actions.

backing up your xen domains

backups are boring, but we all know how important they are. backups can also be quite powerful when working with xen virtualization, since xen allows for convenient back-up and restore of entire systems.

i've recently been working on a flexible, general-purpose script enabling incremental backups of complete xen guests, optimized for secure, distributed environments; xenBackup. if you're working with xen, you might find it useful.

the xenBackup script leverages open-source components like ssh, rsync, and rdiff-backup to create a simple, efficient and functional solution.

lamp performance on the elastic compute cloud: benchmarking drupal on amazon ec2

amazon's elastic compute cloud, "ec2", provides a flexible and scalable hosting option for applications. while ec2 is not inherently suited for running application stacks with relational databases such as lamp, it does provide many advantages over traditional hosting solutions.

in this article we get a sense of lamp performance on ec2 by running a series of benchmarks on the drupal cms system. these benchmarks establish read throughput numbers for logged-in and logged-out users, for each of amazon's hardware classes.

we also look at op-code caching, and gauge it's performance benefit in cpu-bound lamp deployments.

load test your drupal application scalability with apache jmeter: part two

i recently posted an introductory article on using jmeter to load test your drupal application. if you've read this article and are curious about how to build a more sophisticated test that mimics realistic load on your site, read on.

the previous article showed you how to set up jmeter and create a basic test. to produce a more realistic test you should simulate "real world" use of your site. this typically involves simulating logged-in and logged-out users browsing and creating content. jmeter has some great functionality to help you do this.

load test your drupal application scalability with apache jmeter

there are many things that you can do to improve your drupal application's scalability, some of which we discussed in the recent scaling drupal - an open-source infrastructure for high-traffic drupal sites article.

when making scalability modifications to your system, it's important to quantify their effect, since some changes may have no effect or even decrease your scalability. the value of advertised scalability techniques often depends greatly on your particular application and network infrastructure, sometimes creating additional complexity with little benefit.

apache jmeter is a great tool to simulate load on your system and measure performance under that load. in this article, i demonstrate how to setup a testing environment, create a simple test and evaluate the results.

how to setup real-time email-notification for critical syslog events

a few weeks ago, i wrote a short article about the advantages of using syslog for all your logging needs. syslog is the standard logging solution for *nix platforms and integrates into virtually all application servers, network devices, and programming languages.

it is often important for system administrators to get real time notification of critical events. unfortunately, it isn't immediately obvious how to do this in the syslog framework. in this article i show you step-by-step how to do this.

supercharge your css code with m4

css has vastly improved the quality of html markup on the web. however, given its complexity, it has some astounding deficiencies.

one of the biggest problems is the lack of constants. how many times have you wanted to code something like this? light_grey = #CCC. instead you are forced to repeat #CCC in your css. this quickly creates difficult-to-maintain and difficult-to-read code.

an elegant solution to the problem is to use a general purpose preprocessor like m4. m4 gives you a full range of preprocessing capability, from simple constants to sophisticated macros.

using a guardian to ensure your lamp site is always up

to guarantee maximum uptime for your site, it's a good idea to periodically check the health of your system and restart failing components. you can use a simple program to do this automatically. i like to call this type of program, a "guardian".

clearly guardians shouldn't be used as a crutch for a badly configured system. used appropriately, however, they can decrease downtime due to unexpected events or administrator-error.

in this article, i describe how to implement, install and configure a guardian using a lightweight bash script. i go on to describe how to watch over your lamp install using this guardian. please note that all code and configurations have been tested on debian etch but should be useful for other *nix flavors with subtle modifications.

never lose your data again: backup remotely using rsync ssh and rdiff-backup

if you've ever lost precious data after a hard drive failure, you've probably learned your lesson and are now automatically backing up your system.

your treasured pictures, videos and documents may still be at risk. your computer could be stolen, destroyed by flood or fire or chopped into small pieces by a jealous ex-lover.

using a remote backup service is a good way to mitigate against this type of problem. for around $10 a month, you can find companies willing to store 10Gb of data for you. your data is usually accessible using a variety of methods, including rsync, vpn and ftp. to see some of these services, type remote backup rsync service into google.

in this article, i discuss using open source software to take advantage of these services in an efficient and secure manner, allowing the backup of large directories over a dsl-speed line while you sleep.

i am not a yogi. really.

two years ago my neck was in such bad shape that i thought i might have to quit surfing. after much nagging and cajoling my girlfriend, cailin, persuaded me to try yoga. i didn't want to go. what good was it going to do me to sit cross legged in a hot room with a bunch of girls in tie-dye leggings discussing herbal teas?

despite my reservations the two of us headed off to laura camp's class (cailin's longtime friend) at monkey yoga in the east bay. 30 minutes into the class my concerns were not of the herbal-tea variety, but more of a "how do i get the hell out of here, this is kicking my ass" variety.

the fantastic four - drupals unofficial core

using the term "content management system" to describe the drupal cms understates it's full potential. i prefer to consider drupal a web-application development-system, particularly suitable for content-heavy projects.

what are the fantastic four?

drupal's application development potential is provided in large-part by a set of "core" modules that dovetail to provide an application platform that other modules and applications build on. these modules have become a de-facto standard: drupal's fantastic four. our superheros are cck, views, panels and cck field types and widgets. if you are considering using drupal to build a website of any sophistication, you can't overlook these.

scaling drupal step four - database segmentation using mysql proxy

if you've setup a clustered drupal deployment (see scaling drupal step three - using heartbeat to implement a redundant load balancer), a good next-step, is to scale your database tier.

in this article i discuss scaling the database tier up and out. i compare database optimization and different database clustering techniques. i go on to explore the idea of database segmentation as a possibility for moderate drupal scaling. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

scaling drupal step one B - nfs vs rsync

i got some good feedback on my dedicated data server step towards scaling. kris buytaert in his everything is a freaking dns problem blog points out that nfs creates an unnecessary choke point. he may very well have a point.

having said that, i have run the suggested configuration in a multi-web-server, high-traffic production setting for 6 months without a glitch, and feedback on his blog gives example of other large sites doing the same thing. for even larger configurations, or if you just prefer, you might consider another method of synchronizing files between your web servers.

beef up your drupal security with apache mod_rewrite and SSH

if you felt a waft of cold air when you read the recent highly critical drupal security announcement on arbirary code execution using install.php, you were right. your bum was hanging squarely out of the window, and you should probably consider beefing up your security.

drupal's default exposure of files like install.php and cron.php present inherent security risks, for both denial-of-service and intrusion. combine this with critical administrative functionality available to the world, protected only by user defined passwords, broadcast over the internet in clear-text, and you've got potential for some real problems.

bustin surfboards - are carbon fiber boards all they are cracked up to be?

a few weeks at ocean beach in san francisco, i had 20 minutes to kill before heading out for a surf session. i wandered into wise surfboards to check out what was new in surf gear.

my eyes immediately fell on the new aviso carbon boards. you'd be forgiven for thinking; if batman surfed, this is what he would ride. mat black, light, strong, but flexible. how could anyone resist? easy. the price. they're priced at roughly 2x what you'd pay for a regular board.

scaling drupal - an open-source infrastructure for high-traffic drupal sites

the authors of drupal have paid considerable attention to performance and scalability. consequently even a default install running on modest hardware can easily handle the demands of a small website. my four year old pc in my garage running a full lamp install, will happily serve up 50,000 page views in a day, providing solid end-user performance without breaking a sweat.

when the times comes for scalability. moving of of the garage

if you are lucky, eventually the time comes when you need to service more users than your system can handle. your initial steps should clearly focus on getting the most out of the built-in drupal optimization functionality, considering drupal performance modules, optimizing your php (including considering op-code caching) and working on database performance. John VanDyk and Matt Westgate have an excellent chapter on this subject in their new book, "pro drupal development"

once these steps are exhausted, inevitability you'll start looking at your hardware and network deployment.

scaling drupal step three - using heartbeat to implement a redundant load balancer

if you've setup a clustered drupal deployment (see scaling drupal step two - sticky load balancing with apache mod_proxy), a good next-step, is to cluster your load balancer.

one way to do this is to use heartbeat to provide instant failover to a redundant load balancer should your primary fail. while the method suggested below doesn't increase the loadbalancer scalability, which shouldn't be an issue for a reasonably sized deployment, it does increase your the redundancy. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

scaling drupal step two - sticky load balancing with apache mod_proxy

if you've setup your drupal deployment with a separate database and web (drupal) server (see scaling drupal step one - a dedicated data server), a good next step, is to cluster your web servers. drupal generates a considerable load on the web server and can quickly become resource constrained there. having multiple web servers also increases the the redundancy of your deployment. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

creating a xen bridging interface

in my previous blog, i go over a simple how-to for setting up xen on etch. in this configuration the xen guests are only visible to the xen-host, and any services on the xen-hosts must be accessed via port forwarding, tunneling etc.

for some applications, a bridging configuration works better. you can set this up as follows:

scaling drupal step one - a dedicated data server

if you've already installed drupal on a single node (see easy-peasy-lemon-squeezy drupal installation on linux), a good first step to scaling a drupal install is to create a dedicated data server. by dedicated data server i mean a server that hosts both the database and a fileshare for node attachments etc. this splits the database server load from the web server, and lays the groundwork for a clustered web server deployment. here's how you can do it. as usual, my examples are for apache2, mysql5 and drupal5 on debian etch. see the scalability overview for related articles.

easy-peasy-lemon-squeezy drupal installation on linux

installing drupal is pretty easy, but it's even easier if you have a step by step guide. i've written one that will produce a basic working configuration with drupal5 on debian etch with php5, mysql5 and apache2. it might be a help on other configurations too. see the scalability overview for related articles.

syndicate content