• Falcon-Zero: Maximum Velocity

    I previously demonstrated how to complete an assembly of the E. coli genome in 54 minutes using Stampede at TACC and CyVerse. While convenient, this is fairly slow for the amount of provided resources compared to the input size. Out of the box, Falcon can run concurrent tasks on multiple compute nodes. However, this is done by submitting each task as a separate batch job to the system scheduler. A system scheduler is not meant to be a high-performance load balancer, but a fair way schedule variable sized workloads to run. Our systems, which are available to local UT-system, national NSF, and global collaborating researchers, are often oversubscribed and subject to fairly long waiting times. This renders any assumption that hundreds or thousands of job submissions can be run to accomplish a given task totally impractical.


    25 Aug 2016

  • Falcon in the NSF Cloud!

    I am proud to announce the release of an optimized version of the Falcon assembler with the help of Cyrus Proctor and Jawon Song at the Texas Advanced Computing Center. The Falcon diploid assembler is an exciting tool for biological research and allows for the assembly of complex genomes in record time. The tool is currently accessible on the CyVerse Discovery Environment and will be deployed as standard module on the Stampede supercomputer in an upcoming maintenance cycle.


    29 Apr 2016

  • Up Goer Five - DNA Replication

    I decided to answer the call of http://splasho.com/upgoer5/, which is inspired by XKCD’s Up Goer Five


    19 Aug 2015

  • Calculating Methylation Frequency with BSMAP Reads

    My preferred program for aligning bisulfite-sequencing reads to a reference is BSMAP. BSMAP is based on SOAP and aligns reads fairly quickly considering the variability that bisulfite treatment introduces. While there are other fast BS-Seq aligners (GSNAP, Bismark, BS Seeker) I prefer BSMAP because it comes with the script methratio.py to post-process the aligned reads for quick interpretation. The script parses all aligned reads and produces a tabulated output similar to VCF (Variant Call Format) which not only includes the methylation frequency, but also base representation from both strands. Below is an example of the output from methratio.py.


    10 Jun 2014

  • Functions on a Google Documents Table

    Today I had to make an expense report and decided to use a Google document instead of a spreadsheet so I could embed images of all receipts as I went. I knew Google documents had tables and figured that they had simple functions like SUM. They did not. What they did have was Google Apps Script found under Tools -> Script Editor. I had a hard time with the tutorials, but I’m sure this project is still maturing and might be dropped from Google’s offerings any day. However, I did manage to make a script (with a button!) that sums dollar values in a table and prints the total in the last row.


    09 Jun 2014