Wednesday, 16 November 2011

Fedora 16 - Kile installation

Kile is a fantastic editor for creating LaTEX documents. It is part of the KDE suite of applications, but runs perfectly fine under Gnome when the KDE libraries are installed.

Unfortunately, the Kile package in Fedora 16 appears to be broken. When trying to start the application, a dialog box pops up with the message: "No editor component found. Please check your KDE installation". The application still manages to load, but is missing all icons. Attempting to open a file causes Kile to crash.

Through trial-and-error and a nudge in the right direction from the Arch forum (, I managed to fix Kile by installing two more packages.

sudo yum install kate-libs libkate

Monday, 14 November 2011

Enable graphical plymouth boot in Fedora 16 with nvidia drivers

Installation of nvidia drivers under Fedora is well documented. (See In previous versions of Fedora, enabling the plymouth graphical boot was quite easy. All one had to do was to add the following kernel arguments to grub.conf.
rdblacklist=nouveau nomodeset vga=ask

Starting from release 16, Fedora uses Grub2 as the bootloader. The unfortunate side effect of this change is that the vga parameter is now obsolete. I absolutely hate the ugly text based boot screen - but going back to the nouveau driver was not an option. (Simply because I am stubborn like a mule). So after a lot of tinkering, here's how I managed to get back the graphical boot screen.

DISCLAIMER: Modifying Grub configuration files could make your system unbootable. Proceed at your own risk.
  1. Find the video modes supported by the vbe driver. To do this, press "c" at the grub menu to launch the grub console and run the following commands:
    set pager=1
    insmod vbe

    vbeinfo command will list all the accessible video modes it finds. Choose one that you like. Eg. 1280x800x24. Press ESC to exit the console and hit ENTER to boot the OS.

  2. Create a Grub2 font. I chose the excellent DejaVu font as the example here. Feel free to choose a different font in it's place.
    sudo grub2-mkfont --output=/boot/grub2/DejaVuSansMono.pf2 --size=24 /usr/share/fonts/dejavu/DejaVuSansMono.ttf

  3. Edit /etc/default/grub and add the following lines to the end

    replace GRUB_FONT_PATH with the correct filename of the font you generated in the previous step. GRUB_GFXMODE should be the mode you chose from the output of the vbeinfo command.

  4. Backup /boot/grub2/grub.cfg and regenerate it using the new settings.
    sudo cp /boot/grub2/grub.cfg /boot/grub2/grub.cfg.bkp
    sudo grub2-mkconfig -o /boot/grub2/grub.cfg

  5. Reboot and enjoy the graphics.

Firefox font smoothing in Linux

After upgrading to Fedora 16, I noticed that the font rendering looked awful in Firefox. Even an upgrade to Firefox Aurora made no difference. Purely out of desperation, I tried a method I had used previously to fix a similar issue with Chrome, ( and hit the jackpot!.

The original post by Zach Beane can be found here and all credit should go to him.

Save the following to ~/.fonts.conf:
<match target="font">
<edit name="autohint" mode="assign">
<edit name="hinting" mode="assign">
<edit mode="assign" name="hintstyle">

Installing Caffeine 2.4.1 on Fedora 16

Regular readers probably know that I am a big fan of the Caffeine project ( I even contributed the Gnome Shell patches to it ( After a fresh install of Fedora 16 "Verne", I attempted to install the latest version of Caffeine from the project page - which at the moment is 2.4.1 - and ran into a few problems. So for anyone interested, here's how to install the latest version from source.

  1. Download the tarball from Launchpad and extract it:
    mkdir caffeine 
    tar xvf caffeine_2.4.1%2B419~oneiric1.tar.gz --strip-components 1 -C caffeine
    cd caffeine
  2. Run the setup scripts:
    python build
    sudo python install
  3. Update the icon cache and GSetting schemas:
    sudo gtk-update-icon-cache /usr/share/icons/hicolor/
    sudo glib-compile-schemas /usr/share/glib-2.0/schemas/
  4. Correct the permissions:
    sudo chmod 744 /usr/share/caffeine/images/*
    sudo chmod 744 /usr/share/caffeine/glade/*
  5. To add Caffeine to startup items, run the following command and click the "Add" button. The path to the executable is /usr/bin/caffeine

Monday, 17 October 2011

Gnome Shell Cheat Sheet

Some very handy tips here. I particularly wasn't aware that I could use the scroll wheel to zoom in on windows in the overview mode.

Friday, 14 October 2011

GTIN Validation with Python

GTINs (Global Trade Item Number) are ubiquitous. We commonly see them as barcodes on products. They come in several different types and names such as EAN, UPC or ISBN etc.

Recently, I needed to validate a set of GTINs stored in a file. To my surprise (unless my Google-fu is getting weak), I could not find any libraries written in Python for doing this. The algorithm ( is simple enough to knock together in a few minutes, but I think the requirement is common enough to warrant a ready-made library.

My first attempt at this can be found at This module can validate GTIN-8, GTIN-12, GTIN-13 and GTIN-14 codes in either numeric or string forms. Dashes in the code (as is common with ISBN numbers) are supported.

I have never worked on a publicly available Python module before. Neither am I a professional Python developer. The chances are that there are certain parts in the code that do not conform to conventions. However, it's fully open source - so anybody can contribute to make it better.

Wednesday, 14 September 2011

Windows 8 Developer Preview

Microsoft has just released the Developer Preview of Windows 8 - which can be grabbed from

I had a gander by installing it on a VM. The Metro UI looks quite good and I think it's great that most vendors are moving towards HTML5, CSS and Javascript based apps. (well.. at least they are mostly standardised and for once many giants of the industry are on common grounds). It's a shame that there is no common OS interface standard though. I would quite like to have my Gnome shell extensions work with little or no modifications on Windows 8. One could only dream I suppose.

One thing that drives me up the wall is the trend that OS developers seem to follow; not all users are high school kids who spend their whole time on social networks. Think of the power users dammit! I can't imagine having a very productive work flow on Windows 8. Hopefully I won't ever be forced to use it extensively - I have only just grudgingly gotten used to Gnome shell.

Yes, I am getting old.

Thursday, 7 July 2011

Using System.getProperty() with Jenkins builds

One of my recent projects needed to be able to manipulate file paths according to a set of rules and read and process files directly from the file system. When it came to writing unit tests for this module, I was faced with a dilemma. My usual approach for this kind of system is to use the "Test Double" pattern to hide environment specific details from the tests to make them portable across different platforms and different developer environments. However, the module was about 80% file processing code - which made that approach impractical. So I ended up with a suite of tests that required absolute file paths to work properly.

At this point I should mention how it's near impossible to obtain an absolute file path in Java - specially if you are trying to be portable as possible. All my test data was in the src/test/resources/testjobs directory of the project. The absolute path to this was: /home/charithe/workspace/mercury/src/test/resources/testjobs. Obviously, when somebody else checks out this code, the first part of this path would be something else entirely. I played around quite a bit with ClassLoader.getResource et al, but had no luck whatsoever in making the path portable. In desperation, I even tried accessing Maven properties like - which didn't work out as is to be expected.

With time running out, I did the quickest hack imaginable. I added a custom JVM property named testdata.path to the run configuration and the test suite was changed to call System.getProperty("testdata.path") to retrieve the first half of the path, which was then prepended to the unchanging portion of the path to get the full absolute path. This worked out quite well. I could run the tests successfully from within Eclipse by adding the property definition to the run configuration and ditto for maven with the command:
mvn test -Dtestdata.path=/home/charithe/workspace/mercury

It was a short reprieve though. Once my code was checked-in, the Jenkins build failed miserably. My custom property was resolving to null even though I had added it explicitly to the maven command line of the Jenkins job configuration. After doing a quick Google search, it turns out that this is a known bug/feature of Jenkins (

The fix was quite simple. In fact, it was so simple, when I found it out, I had to bang my head against the desk for not figuring that out earlier. The Maven sure-fire plugin has the ability to pass properties to the unit tests. This can be done through the plugin configuration section of the pom. ( The relevant section of my pom.xml looks as follows:


Thursday, 16 June 2011

Oracle client 11g installation under wine

I am a steadfast Fedora user both at home and at work. However, most of my colleagues tend to use Ubuntu and one of the problems I have come across a few times is the inability to install the Oracle client software under wine in Ubuntu. It all works fine under Fedora, so why does it fail on Ubuntu with the error message "java.lang.NullPointerException at oracle.sysman.oii.oiin.OiinNetOps.addNICInfo("? I set out to find out.

After verifying that the installation error is reproducible and not just caused by a weird configuration specific to a single machine, I poked around the Oracle installation files to try to get an idea about how this error was occurring. Turns out that it is quite easy to reproduce. The following Java class does exactly that:


public class Test
	public static void main(String[] args)
			InetAddress ia = InetAddress.getLocalHost();			
			NetworkInterface nic = NetworkInterface.getByInetAddress(ia);			
			String nicName = nic.getDisplayName();			
		catch (Exception e) 

The above code runs perfectly fine on a Fedora installation but generates a NullPointerException on an Ubuntu installation at the point where nic.getDisplayName() method is called. This is caused by an oddity in network configuration of Debian based distributions. To avoid problems with fully qualified domain name resolution, Debian based workstations automatically get an entry in /etc/hosts of the form:  localhost.localdomain

Note that this is not the loopback address Therefore, the localhost IP address ends up resolving to - a valid "non-special" IP address. This in turn causes the above Java snippet to fail because quite correctly, it cannot detect a NIC bound to on the local machine.
The solution is to either delete the above line from /etc/hosts or to add a new line as follows:
echo `hostname -i` "  "  `hostname -f` >> /etc/hosts

The Java code now runs without any exceptions natively, but under wine, it still causes the NPE. It appears that whatever native library used by Java (javanet.dll?) is not functioning properly under wine. In fact, under wine, Java cannot see any network interfaces at all. This can be proven by running the following piece of Java code under wine:

public class Test 
	public static void main(String[] args)
		Enumeration nicList;
			nicList = NetworkInterface.getNetworkInterfaces();		
		    	NetworkInterface nic = nicList.nextElement();
		    	Enumeration ipList = nic.getInetAddresses();
		    		System.out.println(nic.getName() + "-" + ipList.nextElement());
		catch (SocketException e) 

So how does the Oracle installer run fine under Fedora and not under Ubuntu? I can't find a logical explanation for it. The only difference seems to be that the version of wine in the Ubuntu repositories is 1.2.x while Fedora ships with 1.3.x. (As always, Fedora ships with the bleeding edge development version of wine whilst Ubuntu ships with the stable release). To test whether version makes a difference, I uninstalled the stock wine installation from Ubuntu, downloaded the latest wine source (1.3.22) and compiled it as follows:

sudo apt-get -y remove wine && sudo apt-get autoremove
rm -rf ~/.wine
sudo apt-get -y install flex bison libx11-6 libx11-dev libfreetype6 libfreetype6-dev
tar xvf wine-1.3.22.tar.bz2
cd wine-1.3.22/tools

On my dual core Laptop with 2GB of RAM, compilation took about 1 hour.
Trying to run the Oracle client installer with the newly compiled version of wine still generates the old NPE, BUT, the installer doesn't crash there like before and actually continues to work! There's no logical explanation that I can think of for this strange behaviour, and frankly I am not too bothered about it. The important point is that I can finally mark this problem as solved and get some closure. :)

Tuesday, 14 June 2011

Reading and writing GSettings from Python

Here's the class I wrote to read/write the screensaver settings from GSettings. It's probably not in the best Python style, but it illustrates the idea.

from gi.repository import Gio,GLib

class GnomeScreenLock:

IDLE_DELAY_SCHEMA = 'org.gnome.desktop.session'
IDLE_DELAY_KEY = 'idle-delay'

IDLE_ACTIVATION_SCHEMA = 'org.gnome.desktop.screensaver'
IDLE_ACTIVATION_KEY = 'idle-activation-enabled'

def getIdleDelay(self):
gsettings =
return gsettings.get_value(self.IDLE_DELAY_KEY).get_uint32()

def setIdleDelay(self,delaySeconds):
gsettings =

def isIdleActivationEnabled(self):
gsettings =
return gsettings.get_boolean(self.IDLE_ACTIVATION_KEY)

def setIdleActivationStatus(self,activation):
gsettings =

Disabling screensaver/lock-screen on Gnome 3 during Flash movies

With the release of Gnome 3, most of the old Gnome APIs have undergone major changes. Many of these changes are not backward compatible at all. This presents an interesting challenge; specially at this still early stages of Gnome 3 developement - where almost everything is in a state of flux.

I regularly watch tech presentations from conferences - which are usually presented as Flash videos. Annoyingly, this means that the screen-lock will kick-in every few minutes and blank out the screen if I forget to move the mouse around.  Caffeine was the perfect answer for this situation before, but unfortunately due to API differences, it was no longer working on Gnome 3.  Setting the lock-screen activation delay to its' max value (1 hour - if using the GUI) was not an acceptable solution because I want the screen to blank out and lock itself quite quickly if I am away from the computer for a while.

My investigation into the Gnome lock-screen internals first led me to the following two dconf settings that are related to idle activation:

  • org.gnome.desktop.session.idle-delay - Number of seconds of idle activity before the screen is locked.

  • org.gnome.desktop.screensaver.idle-activation-enabled - Set whether idle detection is enabled.

Existing values for these can be obtained by running the commands:
gsettings get org.gnome.desktop.session idle-delay
gsettings get org.gnome.desktop.screensaver idle-activation-enabled

Values can be changed by using the following commands:
gsettings set org.gnome.desktop.session idle-delay 1800 
gsettings set org.gnome.desktop.screensaver idle-activation-enabled false

On my first attempt, I wrote a Python script that detects when Flash is active and sets the above settings to control the screen-lock activation. However, changes made to GSettings don't seem to propagate back to the relevant components in a timely manner. So the behaviour was mostly unpredictable. It also didn't feel like a very elegant solution to my problem either.

Going back to the drawing boards, I spent a few nights hunting for documentation about Gnome ScreenSaver and SessionManager. It should be mentioned that the one thing that Gnome 3 lacks the most is documentation. I finally ended up reading the Totem media player source code to figure out how it was disabling the screensaver during media playback. It turns out that in Gnome 3, the DBus interface for inhibiting the screensaver had moved from org.gnome.ScreenSaver interface to the org.gnome.SessionManager interface. The method signature has also changed. It now requires an application_id (Gnome specific identifier for applications that are currently running) and the toplevel XID (X Windows handle) of the application in addition to the inhibit reason and the flags. Some nice documentation on this can be found at

This revelation about the DBus solution led to further questions:
Q) How can I check the DBus interface on the actual system to make sure it hasn't changed since it was documented above?
A) D-Feet to the rescue. It's a very handy utility for exploring the active DBus interfaces on the system. I couldn't get it to work off the installation - possibly due to some path problem. However, I could get it to work off of the local directory by running:
d-feet -l

Q) How do I get an application_id and XID to pass to the DBus method call ?
A) I spent quite a lot of time poring through GTK documentation to figure this out - but couldn't find an easy solution. What I did find from my experiments is that these parameters are not really required for the method call to work. You can pass any string value as the app_id and any integer value as the XID and the call will still work correctly. It's an ugly hack, but as they say, perfect is the enemy of good :)

Q) How can I quickly check that the DBus method calls work?
A) I first used dbus-send as follows:
dbus-send --session --dest=org.gnome.SessionManager --type=method_call --print-reply --reply-timeout=20000 /org/gnome/SessionManager org.gnome.SessionManager.Inhibit string:"myApp" uint32:0 string:"Inhibiting" uint32:8

To see if the method call worked, I used the IsInhibited method.
dbus-send --session --dest=org.gnome.SessionManager --type=method_call --print-reply --reply-timeout=20000 /org/gnome/SessionManager org.gnome.SessionManager.IsInhibited uint32:8

The biggest gotcha here is that IsInhibited will always return false. This is because the Gnome Session Manager automatically removes the inhibition if the process calling Inhibit dies. Since Inhibit was called from the dbus-send process which immediately terminates, the inhibition is already removed by the time IsInhibited is called. I spent several hours cursing and losing tufts of hair to figure that one out.
The solution is to use something like iPython. I opened a new console window, started iPython and typed the following commands in it:
import dbus
bus = dbus.SessionBus()
proxy = bus.get_object('org.gnome.SessionManager','/org/gnome/SessionManager')

Then I used dbus-send to call IsInhibited from a different console window and voila!

And this finally leads us to the end. Rather than reinventing the wheel, I decided to contribute the outcome of my research back to the excellent Caffeine application. My branch of Caffeine in LaunchPad now has full support for inhibiting the screensaver in Gnome 3. Hopefully the lead developers will accept my changes back into the trunk and it will soon be widely available to everybody else.

Edit: My changes have been accepted and merged back in to the Caffeine trunk.

Sunday, 5 June 2011

Adding your own launchers to gnome-shell dash

This one has been driving me nuts for days!

I have a few applications that I install on my home directory and never copy to the global /usr directories. Naturally, they don't show up in the applications menu. In Gnome 2, adding a shortcut to such an application was as simple as right clicking the panel and selecting "Create new launcher". Unfortunately, since Gnome 3 seems to be designed by Mac users with single mouse buttons, right clicks have become some sort of a taboo.  (Why can't I right click the desktop to change the wallpaper any more? What is wrong with that dammit!). Worse than that, only applications appearing in the "applications" menu can be added to the dash. So what do you do if you want to have quick access to one of the locally installed apps?

After a bit of Googling, I found this fantastic link:

So, the steps are:

  1. Create a desktop file in ~/.local/share/applications (

  2. Reload gnome-shell (Press Alt+F2, type r and press enter)

  3. Now your app will show up in the applications list. Drag and drop to the dash to create the launcher

I am all for innovation and pushing boundaries, and  I am trying to keep a very open mind about Gnome 3. There are some pretty good ideas in there, and I appreciate the arduous task they have taken on. But come on, why do such blindingly simple things like this have to be so complicated?


Friday, 3 June 2011


Interesting Google tech talk by Jim Gettys about Bufferbloat:

Also checkout netalyzr to test and gather information about your network connection :


Saturday, 21 May 2011

Strong,Soft,Weak and Phantom References in Java

If you read any documentation on Java garbage collection, the term "strong/weak/soft/phantom reference"  crops up quite frequently. It's easy to guess what strong references are, but what the heck is a phantom/weak/soft reference?

I stumbled on a very informative post about each of the reference types at It is an article that perhaps every Java programmer not familiar with this topic should read.  To quote the author of the article: "If you don't know what they are, how will you know when to use them?"


I am paraphrasing the main contents of the article below for posterity and for my own reference:

Strong references:

Just like it says on the tin, the vanilla object references that we are all used to. For example, in the following code snippet, someClassObj and someOtherObj are both strong references.

public class SomeClass
private SomeObject someOtherObject = new SomeObject();

public static void main(String[] args)
SomeClass someClassObj = new SomeClass();

As long as there is a strong reference to an object, it cannot be garbage collected.


Weak References

In certain situations, strong references can be a pain to manage and lead to nasty memory leaks. Imagine a global object cache where object references are stored for faster retrieval. The burden is on the programmer to manage the memory used by the cache. If objects that are no longer needed elsewhere are not removed from the cache manually, they will keep keep using memory unnecessarily. As far as the garbage collector is concerned, these objects are still alive because the cache is holding strong references to them.  Therefore, the programmer ends up doing extra work to manage the memory used by the program.

The above unfortunate situation can be made much better by offloading bulk of the work to the garbage collector itself. The way to achieve this is to store weak references in the cache. Weak references are a hint to the garbage collector that the memory occupied by those objects can be reclaimed if no other links to them are found during the link traversal phase of the collector.

WeakReference myWeakReference = new WeakReference(someClassObj);
objectRef = myWeakReference.get();

The get method of the weak reference object will return null if the object that it points to has been garbage collected. This takes care of the memory used by the unused object, but what about the WeakReference object itself? ReferenceQueue to the rescue!
ReferenceQueue refQueue = new ReferenceQueue();
WeakReference myWeakReference1 = new WeakReference(someClassObj1,refQueue);
WeakReference myWeakReference2 = new WeakReference(someClassObj2,refQueue);

If a reference to a ReferenceQueue object is passed to the WeakReference object when it is constructed, the garbage collector will automatically put the WeakReference object into the queue once the object is garbage collected. Now all we need to do is to periodically look through the reference queue and dispose of any useless weak refs.
Java provides a handy WeakHashMap class that automatically handles the cleanup when it's entries are garbage collected.


Soft References

Soft references are a less eager form of weak references. Generally, objects pointed to by soft references will stay in memory as long as there's enough memory to go around.


Phantom References

The main difference between a weak and a phantom reference is that the get method of a phantom reference will always return null.  The get method of a weak reference returning null does not necessarily mean that the pointed object is removed from the memory. The garbage collector is yet to call the finalizer on that object, and hence there is a slim chance that the object could resurrect itself by virtue of having a weird finalize method that creates a strong reference to it.

Objects pointed to be phantom references  have had their finalizers executed and are already physically removed from memory. Therefore, phantom refs only serve to indicate that a certain piece of memory has been reclaimed.

Monday, 16 May 2011

The Python Challenge

The Python challenge is an online riddle inspired by notpron. The catch is that each level requires you to write some Python code (or Perl, Ruby etc. etc. if you are so inclined) to arrive at the solution. The hardest part is figuring out the cryptic clues. The Python bit is easy; so far, the longest piece of Python I have written to solve a riddle is about 6 lines. (Admittedly, I am still on level 6. But according to the forum posts, none of the challenges require a lot of code)

It's a very challenging but fun way to learn the intricacies of Python, regardless of whether you are a novice or a pro. Give it a try at

Sunday, 15 May 2011

Javac type inference bug

Scenario: A generics based configuration reader that worked perfectly fine under Eclipse, suddenly started failing inside a Jenkins build. The cause was a compilation error: "type parameters of <T>T cannot be determined; no unique maximal instance exists for type variable T with upper bounds int,java.lang.Object".

According to this bug report, this is an almost 6 years old javac bug. Apprently Java 1.7 fixes it, but I haven't had a chance to verify it. Here's some sample code to reproduce it:

package com.lucidelectricdreams;

public class GenericsTest
public <T> T genericReturnTest(int typeToReturn)

case 1: // integer
return (T)(new Integer(12));

case 2: // double
return (T)(new Double(56.2D));

default: // String
return (T)"test";

public static void main(String[] args)
GenericsTest gt = new GenericsTest();

int intRetVal = gt.genericReturnTest(1);
double doubleRetVal = gt.genericReturnTest(2);
String stringRetVal = gt.genericReturnTest(3);



Changing the main method as follows, gets rid of the compiler error:

public static void main(String[] args)
GenericsTest gt = new GenericsTest();

int intRetVal = gt.<Integer>genericReturnTest(1);
double doubleRetVal = gt.<Double>genericReturnTest(2);
String stringRetVal = gt.genericReturnTest(3);


Thursday, 12 May 2011

Kill tasks in Windows through the command line

I received a frantic phone call from my Dad last night, who lives a few thousand miles away, across a few oceans. "My computer is not working. Something is wrong with my hard disk" - he said. "Fix it!". Although it was  flattering that my dad thinks that I could magically fix a bad hard drive from a few thousand miles away, it didn't sound quite right because he had just recently bought that machine. I will spare you my dear readers, of the painful 10 minutes that I spent talking in a very slow and calm voice to my dad to figure out what was really wrong. Eventually I managed to figure out that he was infected with the "Windows Recovery" virus. (

Removing this infection is simple enough if you follow the excellent instructions from the bleepingcomputer guide above. However,  what can you do when you don't have physical access to the machine? Luckily the LogmeIn app I had installed sometime back was still running and accessible, so I could access the computer remotely. However, trying to download the rkill application to stop the virus was impossible because it was blocking all DNS requests out of the machine. The task manager was disabled by the virus as well, so pressing Ctrl+Alt+Del didn't work either. Restarting the computer in safe mode would cut off my remote access through LogMeIn. Asking my dad to press even a single key takes more than 5 minutes of explanations and several wrong attempts - so telling him what to do was not an option either.

The Solution:


There is a little known command in Windows named tasklist, which does the same thing as the Linux ps command. Running the command on a command prompt will display a list of all running processes along with their PIDs. To kill any process, type tskill followed by the PID. For example, to kill PID 2476, type:

tskill 2476

Pretty simple, but very handy command for those sticky situations!




Saturday, 26 March 2011

Hadoop Installation Gotchas

Over the last few years, with most large scale data processors such as Google, Yahoo, Amazon, Twitter and Facebook open-sourcing their internal data crunching algorithms/software, "BigData" projects have taken off exponentially. We are now in the BigData era of computing, where a large number of companies and individuals are contributing to or adopting software stacks such as  Hadoop, Cassandra, MongoDB, Riak, Redis etc. for large scale distributed data processing. Several sub industries have been spawned to provide technical support, hardware and SaaS for BigData as well. What is most appealing and revolutionary about this industry is the fact that open source software is at the heart of it. Any company or individual can dip into the BigData pot without spending large amounts of money and without being held hostage by vendors.

For a recent experiment, I had to install and configure a simple two node Hadoop cluster. In this post, I will highlight a few gotchas that I encountered, the solutions to which were not properly documented anywhere. Hopefully this will save someone else a little bit of time.

I will not go through the process of installing Hadoop, as it is well documented in numerous places. (Michael G. Noll's Hadoop tutorial is an excellent starting point for newbies.)

Gotcha: Server at x not available yet, Zzzzz...
I encountered this error message after I setup my second node and started the whole cluster up. The datanode logs were full of the above message and the dfs health check showed the available space as 0 bytes.

This seems to be a Java bug in resolving names from /etc/hosts. Java networking libraries cache host name resolutions forever. (see: Setting the networkaddress.cache.ttl JVM property to a reasonable number such as 10 should solve this problem, but I had no luck with it. In the end, I managed to get around the issue by using IP addresses instead of hostnames in Hadoop *-site.xml configuration files. Since I was setting up a development cluster, this was acceptable. In a production environment, it is very unlikely that you will be relying on /etc/hosts to resolve hosts anyway. Therefore, it's a minor annoyance that you will only encounter during a trial installation like mine.

Gotcha:  Error reading task output , Too many fetch-failures
I started seeing a bunch of these messages while a job was running in the cluster. Although the job completed successfully, these failures delayed the job quite a lot.

This is a problem with /etc/hosts configuration. If you have multiple aliases for the localhost, make sure the hostname alias that is in the Hadoop configuration file, is the first in the list.
For example, if the current /etc/hosts file looks like: localhost.localdomain localhost hadoop-slave1

Change it to: hadoop-slave1 localhost.localdomain localhost

Gotcha:  java.lang.NoClassDefFoundError for classes defined in external libraries
Any non trivial MapReduce job will need to reference external libraries. However, even if the HADOOP_CLASSPATH variable is correctly set, when the job is run by Hadoop, you will get NoClassDef errors for any classes defined in these external library jars.

This took a bit of searching to find out. Hadoop expects all external dependencies to be in a directory named lib inside the job jar. Use the following Maven assembly descriptor to create a job jar that conforms to this convention. (Make sure the core Hadoop dependencies in the POM are set to the provided scope.)


I will conttinue to update this post as I continue to experiment with Hadoop. If you spot any errors or know of a more elgant solution to some of these problems, please leave a comment.

Monday, 3 January 2011

Fedora 14 64 bit : Distorted Sounds From Flash Player "SQUARE"

If you have been using the new 64 bit beta of Flash Player ("Square") on Fedora 14, you might have noticed that some audio streams are distorted by a weird metallic noise that seems to emanate from the background. Apparently this is caused by a patch to glibc - which removes support for overlapping regions in memcpy. Although this is the right thing to do and clearly Adobe is using memcpy in a non-standard way, it's an annoying bug for non-techie users who don't really care much about whether an application is doing the "right thing" under the hood. 

There are several solutions for the problem at the moment:
I have tried the first two and they both work perfectly.

For those interested, the full Bugzilla thread can be found at