Wednesday 13 November 2013

No FileSystem for scheme: hdfs

I recently came across the "No FileSystem for scheme: hdfs" error from a Scala application that I wrote to work with some HDFS files. I was bundling the Cloudera Hadoop 2.0.0-cdh4.4.0 libraries with my application and it turns out that the FileSystem service definition for HDFS was missing from the META-INF/services directory. The fix is as follows:

  1. Create a META-INF/services directory in src/main/resources and add a file named org.apache.hadoop.fs.FileSystem
    mkdir -p src/main/resources/META-INF/services
    touch src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem
    
  2. Edit the org.apache.hadoop.fs.FileSystem file and add the following content. You can omit almost everything except the last line
    org.apache.hadoop.fs.LocalFileSystem
    org.apache.hadoop.fs.viewfs.ViewFileSystem
    org.apache.hadoop.fs.s3.S3FileSystem
    org.apache.hadoop.fs.s3native.NativeS3FileSystem
    org.apache.hadoop.fs.kfs.KosmosFileSystem
    org.apache.hadoop.fs.ftp.FTPFileSystem
    org.apache.hadoop.fs.HarFileSystem
    org.apache.hadoop.hdfs.DistributedFileSystem
    
  3. If you are using the Maven assembly plugin to create a fat jar, the following stanza should be added to assembly.xml inside the <assembly> tag
        <containerDescriptorHandlers>
            <containerDescriptorHandler>
                <handlerName>metaInf-services</handlerName>
            </containerDescriptorHandler>
        </containerDescriptorHandlers>
    

Saturday 9 November 2013

Reducing the number of threads created by the ElasticSearch Transport Client

This is a quick snippet saved for posterity so that I won't have to pull my hair out the next time I run into this issue. To reduce the number of threads spawned by the ElasticSearch transport client (default is 2 x num_cores) add the following option to the settings object:
Settings settings = ImmutableSettings.settingsBuilder().put("transport.netty.workerCount",NUM_THREADS).build();