Sunday 7 June 2015

Fedora 22 - Unresponsive Mirrors and DNF

After installing a fresh copy of Fedora 22, I discovered that I could not install any packages due to a faulty mirror. DNF, the new package manager, failed to detect the faulty mirror and move on to the next available one. Here are a few tweaks I made to get DNF to work even in the face of bad mirrors.

Manually install the latest update to librepo 

At the time of writing, a bug fix to detect unresponsive mirrors is available in the updates-testing repo. Usually this can be installed using a command like sudo dnf --enable-repo fedora-updates-testing install librepo. However, due to the broken mirror, it will not succeed. Manually download librepo-1.7.16-1.fc22.$ARCH.rpm, python-librepo-1.7.16-1.fc22.$ARCH.rpm and python3-librepo-1.7.16-1.fc22.$ARCH.rpm from http://koji.fedoraproject.org/koji/buildinfo?buildID=639899 and install it as follows: 

sudo rpm -Uvh librepo-1.7.16-1.fc22.x86_64.rpm python-librepo-1.7.16-1.fc22.x86_64.rpm python3-librepo-1.7.16-1.fc22.x86_64.rpm

Enable fastest mirror 

Edit /etc/dnf/dnf.conf and add the line fastestmirror=True. For reference, my complete dnf.conf looks as follows:

[main]
gpgcheck=1
installonly_limit=3
clean_requirements_on_remove=True
fastestmirror=True
timeout=10


Disable repo failovermethod (Optional)

If you are having problems with a particular repo, edit the corresponding file in /etc/yum.repos.d/ and comment out the line that reads: failovermethod=priority. This instructs DNF to try any mirror instead of the first one in the list received from the metadata server.

Wednesday 13 November 2013

No FileSystem for scheme: hdfs

I recently came across the "No FileSystem for scheme: hdfs" error from a Scala application that I wrote to work with some HDFS files. I was bundling the Cloudera Hadoop 2.0.0-cdh4.4.0 libraries with my application and it turns out that the FileSystem service definition for HDFS was missing from the META-INF/services directory. The fix is as follows:

  1. Create a META-INF/services directory in src/main/resources and add a file named org.apache.hadoop.fs.FileSystem
    mkdir -p src/main/resources/META-INF/services
    touch src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem
    
  2. Edit the org.apache.hadoop.fs.FileSystem file and add the following content. You can omit almost everything except the last line
    org.apache.hadoop.fs.LocalFileSystem
    org.apache.hadoop.fs.viewfs.ViewFileSystem
    org.apache.hadoop.fs.s3.S3FileSystem
    org.apache.hadoop.fs.s3native.NativeS3FileSystem
    org.apache.hadoop.fs.kfs.KosmosFileSystem
    org.apache.hadoop.fs.ftp.FTPFileSystem
    org.apache.hadoop.fs.HarFileSystem
    org.apache.hadoop.hdfs.DistributedFileSystem
    
  3. If you are using the Maven assembly plugin to create a fat jar, the following stanza should be added to assembly.xml inside the <assembly> tag
        <containerDescriptorHandlers>
            <containerDescriptorHandler>
                <handlerName>metaInf-services</handlerName>
            </containerDescriptorHandler>
        </containerDescriptorHandlers>
    

Saturday 9 November 2013

Reducing the number of threads created by the ElasticSearch Transport Client

This is a quick snippet saved for posterity so that I won't have to pull my hair out the next time I run into this issue. To reduce the number of threads spawned by the ElasticSearch transport client (default is 2 x num_cores) add the following option to the settings object:
Settings settings = ImmutableSettings.settingsBuilder().put("transport.netty.workerCount",NUM_THREADS).build();

Saturday 8 June 2013

Setting up dnscrypt on Fedora

DNSCrypt is a free service by OpenDNS that provides encrypted DNS lookups. If you are concerned about man-in-the-middle attacks, data collection/spying by various entities or ad injections by unscrupulous ISPs, encrypting your DNS lookups is a good starting point. Bear in mind that just encrypting your DNS lookups will not make you secure online. It has to be used in conjunction with a lot of other tools and services if you really want to safeguard your privacy.

  1. Download the DNSCrypt tarball from http://download.dnscrypt.org/dnscrypt-proxy/ . At the time of writing, the latest version was dnscrypt-proxy-1.3.0.tar.gz
  2. tar xvf dnscrypt-proxy-1.3.0.tar.gz && cd dnscrypt-proxy-1.3.0
    ./configure
    make -j4
    sudo make install
    
  3. Create a new system user to run the service:
    sudo adduser -m -N  -r -s /bin/false dnscrypt
  4. Now start the service in the foreground to make sure everything is working:
    sudo dnscrypt-proxy -u dnscrypt
  5. Change your system DNS server to 127.0.0.1. There are many ways to do this. The adventurous can edit the appropriate script in /etc/sysconfig/network-scripts/. If you don't have NetworkManager installed, editing /etc/resolv.conf would work too. Gnome users: click on the network icon, click 'Network Settings', select the connection and click 'Options'. Then in the 'IPv4 Settings' tab, set the 'Method' to 'Automatic (DHCP) Addresses Only' and type in 127.0.0.1 in the 'DNS Servers' text box.
  6. Restart network service for the DNS server changes to take effect.
    sudo systemctl restart network.service
  7. Now you can verify that the changes have taken effect by running dig google.com and checking the output for the line: SERVER: 127.0.0.1#53(127.0.0.1). Alternatively, navigate to http://www.opendns.com/welcome/ using a web browser. The screen will tell you whether you are using OpenDNS or not.
To run the dnscrypt-proxy service on system startup, create a systemd service as follows:
  1. As root, create the file /etc/systemd/system/dnscrypt.service with the following content:
  2. Refresh the system daemon:
    sudo systemctl daemon-reload
  3. Now the dnscrpyt service will start automatically on every boot. You can manually start or stop the service by issuing the usual systemctl commands as well.
    sudo systemctl start dnscrypt.service

Saturday 9 February 2013

Mounting a Nexus 7 on Fedora

If you need to transfer several gigabytes of data between your computer and the Nexus 7 tablet, the fastest option is to mount the N7 as a MTP file system. One way to achieve this is through the mtpfs tool which can be found in the Fedora repositories. However, I found it to be buggy and slow. The better option, as mentioned in Linux Format 165, is to use jmtpfs. The installation instructions on the source were a bit out-of-date so here's how to compile jmtpfs:

sudo yum install libmtp-devel fuse-devel file-devel
git clone https://github.com/kiorky/jmtpfs.git && cd jmtpfs
./configure && make
sudo make install

Once installed, mounting the N7 is a breeze. Use the USB cable to connect the tablet to the computer and then create a directory where you want to mount it.
mkdir Nexus
jmtpfs Nexus

To unmount, use the command:
fusermount -u Nexus

Saturday 29 December 2012

Recovering from Cassandra "Value already present" error

This is a write-up of the process I recently went through with a sysadmin colleague to recover from a crippling Cassandra failure that threatened to make all of our data inaccessible. I am deviating from my usual step-by-step instruction style here because I am not quite confident that this particular method is applicable to all situations. We had a clear mandate about which sets of data had to be absolutely saved and which could be "let go". This might not be the case for you.  Hopefully, reading about our experience could help you to figure out the best course of action for your situation.

Platform: DataStax Cassandra 1.1.7 (community edition) running on a five node cluster hosted on Amazon EC2.

It all started when our sysadmin brought down the cluster for some maintenance work. This was necessitated by the fact that some our applications had started failing with connection timeout errors from the Cassandra nodes. The CPU load on the cluster was quite high and it appeared that there were excessive garbage collection attempts happening in the JVMs. My colleague was hoping that tuning the heap size and memtable size would solve this problem.

Config changes done, the cluster was ready to be brought back into life but that's when things started going awry. The cluster wouldn't start and we saw the following error message in the logs:
Exception encountered during startup
java.lang.IllegalArgumentException: value already present: 687508
at com.google.common.base.Preconditions.checkArgument

A bit of googling revealed that this is a bug in Cassandra that has been fixed in the latest 1.2 branch.(http://www.datastax.com/support-forums/topic/cassandra-does-not-startup). However, nobody had any suggestions about how to recover from it and get an affected cluster back to life. It was apparently due to a race condition that allocated the same ID to Column Families in two different keyspaces.

We enabled debug logging and figured out the keyspaces and column families that were at the root of the issue. We then thought that perhaps if we edited the metadata and changed the ID of one of the offending column families to something else, the cluster might just start back up. The only way we could think of doing this was by using the sstable2json command to look through the system metadata sstables.

Unfortunately, the sstable2json command first tries to load the entire schema into memory using the exact same process that the daemon uses. This obviously fails because of the duplicate CFID. Now we were truly stuck.

I asked my colleague to send over the system/schema_keyspaces and system/schema_columnfamilies directories to see if I could figure out a way through to them. Perhaps running the sstable2json command on a clean environment without the rest of the data files would work. I was half right in this regard. The tool no longer failed at the duplicate CFID error. But it did throw a different exception now.
sstable2json /home/cell/tmp/schema_columnfamilies/system-schema_columnfamilies-hf-712-Data.db

no non-system tables are defined
Exception in thread "main" org.apache.cassandra.config.ConfigurationException: no non-system tables are defined
 at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:399)

My next step was to download the source code for Cassandra and try to figure out what was causing this error. It turns out that there's an explicit check in the SSTableExport class to not dump any of the system tables. There was no way to switch this behaviour off, so the only option was to modify the source itself. This was quite easy but the next hurdle I had to jump through was getting Cassandra to build locally with my changes. The Ant build script was throwing errors and there was too much pressure and very little time to spend on actually trying to figure out why this was so.

Fortunately, I had another idea. The SSTableExport class which forms the sstable2json tool is a single Java file that uses the Cassandra core classes to do the bulk of the work. I created a new Java project in my IDE, added all the jar files from the Cassandra binary distribution to the classpath and then created a single source file - which was a copy the SSTableExport file with my code changes. This compiled without any issues and I was able to generate my own jar file containing the modified code. (https://gist.github.com/4397133)
java -cp /home/cell/apps/dsc-cassandra-1.1.7/lib/*:/home/cell/apps/dsc-cassandra-1.1.7/config:my-ss.jar me.otiose.SSTableExport /home/cell/tmp/schema_columnfamilies/system-schema_columnfamilies-hf-712-Data.db

The above command now produced a JSON representation of the schema_columnfamilies sstable. However, the ID that was causing the problem was nowhere to be found. So the idea of editing it was no longer possible.

My final desperate attempt to salvage the situation was to try something of a different tack. What if I removed from the metadata, one of the keyspaces that was causing the issue. Will Cassandra then ignore it and proceed with cluster startup?

Using the above command, I dumped the schema_keyspaces sstables to JSON and removed the lines the contained references to the keyspace that we were sacrificing. In order to convert the modified JSON files back to sstables, I tried the json2sstable command but it threw the same exception as the sstable2json command. So it was back to the IDE to edit some more code and I created a new copy of json2sstable without the system table check. (https://gist.github.com/4397133)
java -cp /home/cell/apps/dsc-cassandra-1.1.7/lib/*:/home/cell/apps/dsc-cassandra-1.1.7/config:my-ss.jar me.otiose.SSTableImport -K system -c schema_keyspaces /home/cell/tmp/modified_sstable.json /home/cell/tmp/schema_keyspace_fixed/system-schema_keyspaces-he-712-Data.db

The above command produces a bunch of Data, Index, sha1 and other files. I copied them all to the system/schema_keyspaces directory on the node, overwriting the existing files. Then to verify that the problem was fixed, I ran the unmodified sstable2json command. Now it worked without any complaints about the duplicate CFID!

I quickly followed the same procedure on the other nodes. (Each node has a unique set of files in the system/schema_keyspaces directory, so you cannot simply copy a single fixed file to every node). My sysadmin colleague started bringing the nodes up one-by-one and finally, the cluster was back up again albeit in a very flaky state.

Now it was the time for my colleague to do his magic. One or two nodes had missing schemas for some of the column families. He did a nodetool repair on those and then ran a describe on one of the functioning nodes to obtain a schema definition, which he then ran on the unstable nodes to recreate the schema.

Finally, after a long fought battle, our cluster was operational again with no data loss. We were quite lucky in this regard because the keyspace that I manually removed from the metadata was due to be dropped anyway. If it were not the case, I would probably have tried editing the column family metadata itself. We may have experienced some data loss in the latter case but it is far more preferable to losing all of it.

Saturday 15 September 2012

Powerlines Prompt for Bash

Powerline-bash is a cool python script that changes your bash prompt depending on the context. For example: if you are on a git directory, the prompt will display the branch name. If there are any uncommitted changes, a small plus sign will appear. Pretty cool!. Get it from https://github.com/milkbikis/powerline-bash and follow the instructions in the README to install. I had to additionally download the patched font (linked from the README) to get the prompt to display properly.