• Up Goer Five - DNA Replication

    I decided to answer the call of http://splasho.com/upgoer5/, which is inspired by XKCD’s Up Goer Five


    19 Aug 2015

  • Calculating Methylation Frequency with BSMAP Reads

    My preferred program for aligning bisulfite-sequencing reads to a reference is BSMAP. BSMAP is based on SOAP and aligns reads fairly quickly considering the variability that bisulfite treatment introduces. While there are other fast BS-Seq aligners (GSNAP, Bismark, BS Seeker) I prefer BSMAP because it comes with the script methratio.py to post-process the aligned reads for quick interpretation. The script parses all aligned reads and produces a tabulated output similar to VCF (Variant Call Format) which not only includes the methylation frequency, but also base representation from both strands. Below is an example of the output from methratio.py.


    10 Jun 2014

  • Functions on a Google Documents Table

    Today I had to make an expense report and decided to use a Google document instead of a spreadsheet so I could embed images of all receipts as I went. I knew Google documents had tables and figured that they had simple functions like SUM. They did not. What they did have was Google Apps Script found under Tools -> Script Editor. I had a hard time with the tutorials, but I’m sure this project is still maturing and might be dropped from Google’s offerings any day. However, I did manage to make a script (with a button!) that sums dollar values in a table and prints the total in the last row.


    09 Jun 2014


    To better enable researchers to analyze their own fastq files without downloading additional software, I created FASTQA-JS. Base quality scores often drop off towards the end of a read when errors accumulate in the spots of DNA on a flowcell. It is common practice to analyze these quality scores and truncate the reads when quality scores are below a specified threshold. Below is an example of the output from FASTQA-JS.


    29 May 2014

  • Exponential Growth of NCBI Genomes

    I have been studying bioinformatics for 4 years and had the privilege to work in a sequencing lab and personally see the technology change and accelerate. When I started work at the CGB in 2010, a whole genome cost around $10,000 to sequence on an Illumina GA II machine. In 2013, the IU Bioinformatics Club resequenced a human genome for $4,000. Now Google’s Calico is pushing for the $1,000 human genome. This price point make whole genome resequencing more affordable than the average trip to the hospital, making personalized medicine very attractive.


    31 Mar 2014