• Comparing DDL and NCCL horovod performance

    My preferred framework for distributed machine learning is Horovod. Horovod makes it simple to process multiple batches batches of data across multiple compute nodes with minimal code changes, and it works with multiple libraries. The communication is fairly lightweight since it only broadcasts updates and aggregates losses, so the speedup is almost linear.


    13 May 2020

  • Exposing conda with LMOD

    The Longhorn GPU cluster at the Texas Advanced Computing Center, is comprised of 108 IBM Power System AC922 nodes, each with 4 NVIDIA V100 GPUs. In addition to general GPU-accelerated applications, this system is meant to support machine learning. If you go to PyPI, you will discover that there are official x86_64 builds of tensorflow-gpu, but none for the PowerPC architecture. Even if the architecture matched the system, you still have to hope it was compiled against your version of CUDA, since this is omitted from the package name.


    30 Apr 2020

  • Comparing loss values of different data sizes

    While running the hyperparameter optimization of a model, where one of the parameters was the actual data size, I realized that I didn’t know if loss values calculated from different data sizes were comparable. I knew that different loss metrics could not be compared, but I was not sure if different data sizes affected the final value.


    22 Jan 2020

  • Finding contiguous region coordinates with python

    Bioinformatics often deals with sequential data with data laid out on a 1-dimensional genomic coordinate system. Since these data signals are often compared against functional regions in genome annotations, it is often necessary to identify contiguous regions of interest. I have never come across a function built into numpy or scipy to accomplish this, but I was inspired from two stackoverflow posts:


    29 Nov 2019

  • Mitigating a memory leak in Tensorflow's LSTM

    I have been running a parameter sweep on a recurrent neural network (RNN) consisting of long short-term memory (LSTM) layers, and most of my long runs would eventually fail after being able to allocate additional memory.


    17 Oct 2019