Greg Zynda - Home

Comparing DDL and NCCL horovod performance

My preferred framework for distributed machine learning is Horovod. Horovod makes it simple to process multiple batches batches of data across multiple compute nodes with minimal code changes, and it works with multiple libraries. The communication is fairly lightweight since it only broadcasts updates and aggregates losses, so the speedup is almost linear.
Read more...
13 May 2020
Exposing conda with LMOD

The Longhorn GPU cluster at the Texas Advanced Computing Center, is comprised of 108 IBM Power System AC922 nodes, each with 4 NVIDIA V100 GPUs. In addition to general GPU-accelerated applications, this system is meant to support machine learning. If you go to PyPI, you will discover that there are official x86_64 builds of tensorflow-gpu, but none for the PowerPC architecture. Even if the architecture matched the system, you still have to hope it was compiled against your version of CUDA, since this is omitted from the package name.
Read more...
30 Apr 2020
Comparing loss values of different data sizes

While running the hyperparameter optimization of a model, where one of the parameters was the actual data size, I realized that I didn’t know if loss values calculated from different data sizes were comparable. I knew that different loss metrics could not be compared, but I was not sure if different data sizes affected the final value.
Read more...
22 Jan 2020
Finding contiguous region coordinates with python

Bioinformatics often deals with sequential data with data laid out on a 1-dimensional genomic coordinate system. Since these data signals are often compared against functional regions in genome annotations, it is often necessary to identify contiguous regions of interest. I have never come across a function built into numpy or scipy to accomplish this, but I was inspired from two stackoverflow posts:
Read more...
29 Nov 2019
Mitigating a memory leak in Tensorflow's LSTM

I have been running a parameter sweep on a recurrent neural network (RNN) consisting of long short-term memory (LSTM) layers, and most of my long runs would eventually fail after being able to allocate additional memory.
Read more...
17 Oct 2019

1