A Sourceful of Secrets

Andrew E. Bruno

Archive for the ‘Java’ Category

Counting the number of reads in a BAM file

with 22 comments

The output from short read aligners like Bowtie and BWA is commonly stored in SAM/BAM format. When presented with one of these files a common first task is to calculate the total number of alignments (reads) captured in the file. In this post I show some examples for finding the total number of reads using samtools and directly from Java code. For the examples below, I use the HG00173.chrom11 BAM file from the 1000 genomes project which can be downloaded here.

First, we look at using the samtools command directly. One way to get the total number of alignments is to simply dump the entire SAM file and tell samtools to count instead of print (-c option):

$ samtools view -c HG00173.chrom11.ILLUMINA.bwa.FIN.low_coverage.20111114.bam
5218322

If we’re only interested in counting the total number of mapped reads we can add the -F 4 flag. Alternativley, we can count only the unmapped reads with -f 4:

# Mapped reads only
$ samtools view -c -F 4 HG00173.chrom11.ILLUMINA.bwa.FIN.low_coverage.20111114.bam
5068340

# Unmapped reads only
$ samtools view -c -f 4 HG00173.chrom11.ILLUMINA.bwa.FIN.low_coverage.20111114.bam
149982

To understand how this works we first need to inspect the SAM format. The SAM format includes a bitwise FLAG field described here. The -f/-F options to the samtools command allow us to query based on the presense/absence of bits in the FLAG field. So -f 4 only output alignments that are unmapped (flag 0×0004 is set) and -F 4 only output alignments that are not unmapped (i.e. flag 0×0004 is not set), hence these would only include mapped alignments.

An example for paired end reads you could do the following. To count the number of reads having both itself and it’s mate mapped:

$ samtools view -c -f 1 -F 12 HG00173.chrom11.ILLUMINA.bwa.FIN.low_coverage.20111114.bam
4906035

The -f 1 switch only includes reads that are paired in sequencing and -F 12 only includes reads that are not unmapped (flag 0×0004 is not set) and where the mate is not unmapped (flag 0×0008 is not set). Here we add 0x0004 + 0x0008 = 12 and use the -F (bits not set), meaning you want to include all reads where neither flag 0×0004 or 0×0008 is set. For help understanding the values for the SAM FLAG field there’s a handy web tool here.

There’s also a nice command included in samtools called flagstat which computes various summary statistics. However, I wasn’t able to find much documentation describing the output and it’s not mentioned anywhere in the man page. This post examines the C code for the flagstat command which provides some insight into the output.

$ samtools flagstat HG00173.chrom11.ILLUMINA.bwa.FIN.low_coverage.20111114.bam
5218322 + 0 in total (QC-passed reads + QC-failed reads)
273531 + 0 duplicates
5068340 + 0 mapped (97.13%:-nan%)
5205999 + 0 paired in sequencing
2603248 + 0 read1
2602751 + 0 read2
4881994 + 0 properly paired (93.78%:-nan%)
4906035 + 0 with itself and mate mapped
149982 + 0 singletons (2.88%:-nan%)
19869 + 0 with mate mapped to a different chr
15271 + 0 with mate mapped to a different chr (mapQ>=5)

The above shows a few simple examples using the samtools command but what if you wanted to count the total number of reads in code? I’ve been using the excellent Picard Java library as of late and haven’t found a simple way to do this via the API. I was looking for a fast way to compute this without having to scan the entire BAM file each time. Would love to see this added as a public function to the BAMIndexMetaData object or similar. Here’s a function I wrote to calcuate the total mapped reads from a BAM file. This makes use of the BAM index for speed and obviously requires you to first index your BAM file:

public int getTotalReadCount(SAMFileReader sam) {
    int count = 0;

    AbstractBAMFileIndex index = (AbstractBAMFileIndex) sam.getIndex();
    int nRefs = index.getNumberOfReferences();
    for (int i = 0; i < nRefs; i++) {
        BAMIndexMetaData meta = index.getMetaData(i);
        count += meta.getAlignedRecordCount();
    }

    return count;
}

This uses the BAMIndex to loop through each reference and sum the total mapped reads. A complete working example is included below:

import java.io.File;

import net.sf.samtools.AbstractBAMFileIndex;
import net.sf.samtools.BAMIndexMetaData;
import net.sf.samtools.SAMFileReader;

public class CountMapped {

    public static void main(String[] args) {
        File bamFile = new File(args[0]);

        SAMFileReader sam = new SAMFileReader(bamFile, 
                                 new File(bamFile.getAbsolutePath() + ".bai"));

        AbstractBAMFileIndex index = (AbstractBAMFileIndex) sam.getIndex();

        int count = 0;
        for (int i = 0; i < index.getNumberOfReferences(); i++) {
            BAMIndexMetaData meta = index.getMetaData(i);
            count += meta.getAlignedRecordCount();
        }

        System.out.println("Total mapped reads: " + count);
    }

}

Requires the Picard Java library. To compile/run:

$ javac -cp samtools.jar CountMapped.java
$ java -cp samtools.jar:. CountMapped HG00173.chrom11.ILLUMINA.bwa.FIN.low_coverage.20111114.bam
Total mapped reads: 5068340

Written by Andrew

2012/04/13 at 22:31

Posted in Bioinformatics, Java

passtab – store passwords in your wallet

with 6 comments

Here’s a quote from Bruce Schneier that essentially sums up the motivation for this post:

We’re all good at securing small pieces of paper. I recommend that people write
their passwords down on a small piece of paper, and keep it with their other
valuable small pieces of paper: in their wallet.

I recently read an excellent blog post by John Graham-Cumming in which he presents a elegant system for writing down your passwords using a Tabula Recta. I was inspired by this concept so I created a tool called passtab which aims to provide a light-weight system for managing passwords based on his idea. This post is about the general usage of passtab and presents some of the password management capabilities. This is not your grandmothers password manager so if you’re looking for a nice GUI point and click application that’s easy to use you can stop reading right here. This is for hardcore folks who enjoy looking up their passwords in archaic tablets invented by ancient cryptographers with last names like Trithemius. For the impatient, you can grab a copy of the latest version on github.

Introducing passtab

passtab is a light-weight system for managing passwords using a Tabula Recta. passtab has two main features: 1. generating random Tabula Recta’s in PDF format for printing and storing in your wallet 2. fetching passwords from the Tabula Recta (password managment). These features are independent and you can use passtab to only generate PDFs or optionally make use of the password management features. One unique benefit is the ability to have both an electronic and paper copy of your passwords. You can download the binary release of passtab at github here. Unpack the distribution and run ./bin/passtab --help for a list of options. If the startup shell script doesn’t work you can run java -jar lib/passtab-uber.jar --help. The following sections illustrate some use cases of passtab.

Generate a random Tabula Recta in PDF

passtab can generate random Tabula Recta’s in PDF format.

$ ./bin/passtab --format pdf --output passtab.pdf
Jun 12, 2011 11:16:29 AM org.qnot.passtab.PassTab generate
INFO: Generating a random Tabula Recta (might take a while)...
$ ls *.pdf
passtab.pdf

Here’s an example PDF generated from passtab. You can now print this PDF out and store in your wallet!

How to use the Tabula Recta

Here’s a simple example (taken directly from the README), suppose we have the following Tabula Recta:


    | A B C D E F G H I J K L M N 
  --|----------------------------
  A | _ u } I ` } R ) a < L : a A 
  B | - o ( : p # O % . _ ; ' j L 
  C | w c ( c y 2 h y ~ N O * > w 
  D | o : R m L % V , d H r Y B j 
  E | 9 , < 0 J p a o ) O w 0 w # 
  F | C j i } i z 2 $ O R 5 @ T I 
  G | Q - E m 8 N c / + u W Y V > 
  H | , y } U Y i j i q w q c - 4 
  I | K j W H e ; I ? E 7 H v 2 + 
  J | g * 7 4 E } a h Y z < " : w 
  K | . _ } I / J k 1 a D ^ ; p K 
  L | ` < A L c z } } I P ? 4 y T 
  M | F D < 8 < 0 R B t 9 X o B 2 
  N | I r O E m o a + Y W w ; : 7

And suppose we want to get our password for logging into webmail at acme.com. We decide to use the first and last letter of the domain name as the start row/column of the password and we want a password 8 characters in length. So we start at the intersection of ‘A’ and ‘E’ and read off 8 characters diagonally resulting in the password: '#h,)RWc

Defining a scheme for selecting the starting row/column for a given password is completely up to the user and can be as simple or as complex as one desires. The direction for reading the password is also up to the user to define (left, right, diagonally, etc.). See John Graham-Cumming’s excellent blog post for more examples.

This method is slightly more complex than just writing down your passwords on a sheet of paper but the added complexity offers some advantages:

  1. Can store all your passwords on a single sheet of paper
  2. If someone steals this sheet of paper they’ll have a harder time figuring out what your passwords are
  3. Allows you to use strong random passwords
  4. If you want to change your passwords just re-generate a new Tabula Recta. Your scheme for selecting passwords can stay the same

passtab makes no assumptions about how passwords are read nor does it know anything about your scheme (unless you configure it). Now that you don’t have to remember long random passwords anymore what do you need to remember when using a Tabula Recta? Well first, you need to come up with a method for finding the starting position for a given password. In the example above this can be as simple as using characters from a domain/host name. But the beauty is you can be as creative as you want. A scheme that works for most of your passwords would probably be ideal but you can certainly generate multiple Tabula Recta’s if you like. Once you have a way of coming up with a starting location you need to define a method for reading off the password. In passtab this is called a sequence. In the example above we simply read 8 characters diagonally. But again you can be creative here. You could read 8 characters diagonally skipping every 3rd character, etc. Lastly, you’ll need to remember what to do if you hit the edge of the Tabula Recta before the end of the password. For example, if you start at Z:Z and want to read 8 characters diagonally you can’t because you reached the end of the Tabula Recta. In passtab this is called a collision. In this case we could just continue reading following the edge.

Using the Tabula Recta allows you to make use of long secure random passwords and only have to remember three simple things. You also have all your passwords on a single sheet of paper that fits in your wallet.

Custom Alphabets

In passtab, a Tabula Recta consists of two alphabets. The header alphabet and the data alphabet. The header alphabet is used for the row and column heading of the Tabula Recta and forms the basis for finding the starting location of the passwords. The data alphabet is used to generate the contents of the Tabula Recta and passtab will randomly pick characters from this alphabet using a cryptographically secure random number generator. By default, passtab uses a header alphabet of 0-9A-Z and a data alphabet consisting of all printable ASCII characters. It’s important to keep in mind that the data alphabet directly effects the entropy of your passwords. passtab allows you to customize these alphabets allowing you to generate any kind of Tabula Recta, for example:

$ ./bin/passtab -b A,B,C,D -a 'a,b,c,d,1,2,3,4,!,@,#'
Jun 12, 2011 10:24:26 PM org.qnot.passtab.PassTab generate
INFO: Generating a random Tabula Recta (might take a while)...
  A B C D 
A d 1 @ 4 
B c 4 @ 2 
C b 3 3 ! 
D 1 a @ 4 

Here’s a Tabula Recta using greek symbols as the header alphabet (here’s the example PDF):

$ ./bin/passtab -b 'Σ,Τ,Π,ρ,ϋ,ψ' -a 'a,b,c,d,1,2,3,4,!,@,#'
Jun 12, 2011 11:26:00 PM org.qnot.passtab.PassTab generate
INFO: Generating a random Tabula Recta (might take a while)...
  Σ Τ Π ρ ϋ ψ 
Σ 1 2 1 d d c 
Τ 1 2 b b @ c 
Π 1 # c 3 2 @ 
ρ 4 2 d 2 @ 3 
ϋ 2 3 b 1 ! b 
ψ d @ # c ! a

Password Management

So this is all well and great, but in reality it can be a huge pain to have to look up your webmail password in a Tabula Recta that’s on a sheet of paper in your wallet every time you login. For this reason, passtab has some optional features to help read passwords from the Tabula Recta. This allows you to have both a hard copy of the Tabula Recta in your wallet and an electronic version stored on your hard drive for quick access to your passwords. This obviously comes with some security considerations and care must be taken to protect the passtab database as you would any ssh private key for example. If someone got a hold of the passtab database file they could brute force your Tabula Recta. I ended up creating an encrypted thumb drive and store my passtab configuration and database files on it. You could also use gpg to encrypt it or any other method to protect it from the bad guys. This next section discusses the password management features of passtab.

First some definitions:

  • Direction: a direction to move on the Tabula Recta. Valid values are N,S,E,W,NE,NW,SE,SW
  • Sequence Item: a sequence item consists of a length and direction. For example, 12:SE would mean move 12 characters in the SE direction (diagonally)
  • Sequence: a sequence is a list of sequence items. This allows you to define arbitrary sequences for reading passwords. For example, 4:SE,3:N,1:S would mean read 4 characters SE (diagonally) followed by 3 characters N (up) followed by 1 character S (down)
  • Collision: a collision defines what directions to move if we hit the edge of the Tabula Recta before the end of the password. You can define more than one direction and they will be tried in order. For example, N,NE,E,SE,S,SW,W,NW would mean if we hit a wall try those directions in order until we’re able to move again

Generate a Tabula Recta in PDF and save to a passtab database

passtab can generate a Tabula Recta in PDF along with storing it in a passtab database. The passtab database is stored in JSON format and can be easily accessed outside of passtab (any language that can read JSON files). Again, you’ll want to store that JSON file someplace safe. For example:

$ ./bin/passtab --dbsave --name mypasstab
Jun 12, 2011 10:48:33 PM org.qnot.passtab.PassTab generate
INFO: Generating a random Tabula Recta (might take a while)...
$ ls mypasstab.*
mypasstab.json  mypasstab.pdf

Reading passwords from the passtab database

Once we’ve created our passtab database we can now fetch passwords by telling passtab the starting location and the sequence to read. For example, suppose we want to read a password starting at row ‘B’ and column ‘N’ and we want a password 10 characters in length reading diagonally:

$ ./bin/passtab -i mypasstab.json --getpass B:N --sequence 9:SE
o6,ZzH{e$@

Copy the password to the clipboard using xclip:

$ ./bin/passtab -i mypasstab.json --getpass B:N --sequence 9:SE --chomp | xclip

We used 9:SE as our sequence because passtab includes the character at the start location in the password. If we didn’t want to include this character we can optionally skip it like so:

$ ./bin/passtab -i mypasstab.json --getpass B:N --sequence 10:SE --skipstart
6,ZzH{e$@_

Define a list of directions to try in the event of a collision. This will try the directions N,S,E,W in order until we can move again. Here we start at Z:Z and can’t move SE (diagonally) so we try N (up) which works so we move N (up) until we hit another collision:

$ ./bin/passtab -i mypasstab.json --getpass Z:Z --sequence 9:SE --collision N,S,E,W
a((vy&0bV&

Conclusion

This post introduced a new tool called passtab for managing passwords using a Tabula Recta. I’m sure it has plenty of bugs so use at your own risk and if by chance you find it somewhat useful I’d be very interested in any feedback.

Written by Andrew

2011/07/01 at 00:15

Posted in Hacks, Java, passtab, passwords

Creating executable jars with Maven

with 14 comments

After wrestling with Maven assemblies for while I finally figured out how to build executable jars. The Maven assembly plugin allows you to define ways to package up your project for distribution by creating various assembly descriptor files. Here’s a quick example of a Maven assembly for building an executable jar (uberjar). For this example we’ll create a brand new project from scratch but it should be easy to see how to integrate into an existing project.

First step lets create a test project:

$ mvn archetype:create -DgroupId=org.qnot.example -DartifactId=hello-world
$ cd hello-world

Next add a few dependencies to the project. In this example we’ll add a few libraries from jakarta commons. The <dependencies/> section in the pom.xml should now look like this:

  <dependencies>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>3.8.1</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>commons-cli</groupId>
      <artifactId>commons-cli</artifactId>
      <version>1.1</version>
    </dependency>
    <dependency>
      <groupId>commons-lang</groupId>
      <artifactId>commons-lang</artifactId>
      <version>2.3</version>
    </dependency>
  </dependencies>

Create a META-INF/ directory to store the MANIFEST.MF file which defines the main class in the executable jar.

$ mkdir -p src/main/resources/META-INF/
$ echo 'Main-Class: org.qnot.example.App' > MANIFEST.MF 

Create a src/assemble directory to store the assembly descriptor files

$ mkdir src/assemble

Next we’ll create the actual assembly descriptor file which defines how to package up the jar. Create the file src/assemble/exe.xml with the following xml:

<assembly>
  <id>exe</id>
  <formats>
    <format>jar</format>
  </formats>
  <includeBaseDirectory>false</includeBaseDirectory>
  <dependencySets>
    <dependencySet>
      <outputDirectory></outputDirectory>
      <outputFileNameMapping></outputFileNameMapping>
      <unpack>true</unpack>
      <scope>runtime</scope>
      <includes>
        <include>commons-lang:commons-lang</include>
        <include>commons-cli:commons-cli</include>
      </includes>
    </dependencySet>
  </dependencySets>
  <fileSets>
    <fileSet>
      <directory>target/classes</directory>
      <outputDirectory></outputDirectory>
    </fileSet>
  </fileSets>
</assembly>

Inside the <dependecySets/> is where you can add all the libraries you’d like to include in the uberjar. These must also be defined in your pom.

Finally, add the maven-assembly-plugin to the pom:

  <build>
    <finalName>hello-world</finalName>
    <plugins>
      <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <configuration>
          <descriptors>
            <descriptor>src/assemble/exe.xml</descriptor>
          </descriptors>
          <archive>
            <manifestFile>src/main/resources/META-INF/MANIFEST.MF</manifestFile>
          </archive>
        </configuration>
      </plugin>
    </plugins>
  </build>

To run the assembly and build the executable jar:

$ mvn assembly:assembly
$ java -jar target/hello-world-exe.jar
Hello World! 

I tested the hello-world example using the latest Maven release (2.0.8) and maven-assembly-plugin-2.2-beta-1. If you run into any issues try and update your Maven plugins by running:

$ mvn -U compile

You can download the example hello-world project here.

Written by Andrew

2008/01/24 at 20:53

Posted in Java

Rotate Labels JFreeChart

with 15 comments

When creating a chart that has rather long labels for the x-axis it is sometimes desirable to rotate them a bit so they fit on the plot. The method to use is setCategoryLabelPositions(..) on the CategoryAxis class. Here’s a quick example:

And the code..

import java.io.File;
import java.io.IOException;

import org.jfree.chart.ChartFactory;
import org.jfree.chart.ChartUtilities;
import org.jfree.chart.ChartColor;
import org.jfree.chart.JFreeChart;
import org.jfree.chart.plot.CategoryPlot;
import org.jfree.chart.plot.PlotOrientation;
import org.jfree.chart.axis.CategoryLabelPositions;
import org.jfree.chart.axis.CategoryAxis;
import org.jfree.data.category.DefaultCategoryDataset;

public class RotateLabels {
    public static void main(String[] args) {
        DefaultCategoryDataset dataSet = new DefaultCategoryDataset();
        dataSet.addValue(51, "series", "Colonel Forbin");
        dataSet.addValue(92, "series", "The Lizards");
        dataSet.addValue(33, "series", "Wilson");
        dataSet.addValue(77, "series", "Rutherford the Brave");
        dataSet.addValue(37, "series", "The Unit Monster");
        dataSet.addValue(97, "series", "The Famous Mockingbird");
        dataSet.addValue(67, "series", "Poster Nutbag");

        JFreeChart chart = ChartFactory.createBarChart(
            "Gamehendge",
            null,
            null,
            dataSet,
            PlotOrientation.VERTICAL,
            false,
            false,
            false
        );

        CategoryPlot plot = (CategoryPlot)chart.getPlot();
        CategoryAxis xAxis = (CategoryAxis)plot.getDomainAxis();
        xAxis.setCategoryLabelPositions(CategoryLabelPositions.UP_45);

        chart.setBackgroundPaint(ChartColor.WHITE);
        try {
            ChartUtilities.saveChartAsPNG(new File("chart.png"), chart, 400, 300);
        } catch(IOException e) {
            e.printStackTrace();
        }
    }
}

Written by Andrew

2007/08/14 at 19:06

Posted in Java

MySQL bigint types and iBATIS

with one comment

One nuance I recently ran into while using iBATIS was inserting data into MySQL bigint unsigned columns. iBATIS doesn’t seem to have a way to handle BigInteger data types and throws an exception when attempting to do an insert. Fetching data out seemed to work OK because if iBATIS doesn’t know how to handle a certain type it just returns a java.lang.Object. The way to go about inserting BigInteger types is to set up a type handler. Here’s an example type handler for BigInteger types:

package org.qnot.util;

import java.math.BigDecimal;
import java.math.BigInteger;
import java.sql.SQLException;
import java.sql.Types;

import com.ibatis.sqlmap.client.extensions.ParameterSetter;
import com.ibatis.sqlmap.client.extensions.ResultGetter;
import com.ibatis.sqlmap.client.extensions.TypeHandlerCallback;

public class BigIntegerTypeHandler implements TypeHandlerCallback {

    public Object getResult(ResultGetter getter) throws SQLException {
        if(getter.wasNull()) {
            return null;
        }

        Object o = getter.getObject();
        if(o instanceof BigDecimal) {
            BigDecimal bd = (BigDecimal)o;
            return bd.toBigInteger();
        } else if(o instanceof BigInteger) {
            return (BigInteger)o;
        } else {
            return o;
        }
    }

    public void setParameter(ParameterSetter setter, Object parameter)
            throws SQLException {
        if (parameter == null) {
            setter.setNull(Types.BIGINT);
        } else {
            BigInteger i = (BigInteger) parameter;
            setter.setBigDecimal(new BigDecimal(i));
        }
    }

    public Object valueOf(String s) {
        return s;
    }
}

Written by Andrew

2007/07/16 at 13:22

Posted in Java

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: