• Multiple Python Logging formats

    I have been writing a python module that utilizes threading, and I wanted it to have a specific logging format so messages from separate threads could be differentiated. However, when the module was imported, it inherited any (root logger) format that was specified before it.


    26 Sep 2019

  • Interrupting Python Threads

    In my most recent project, rgc, I have been using the python threading library for concurrent operations. Python Threads are often overlooked because the python GIL forces them to share a single CPU core, but they are great for scaling I/O or subprocess calls without worrying about communication.


    21 Dec 2018

  • Multiprocessing Size and Rank

    I have always thought Python did a great job exposing parallel processing with the multiprocessing package. The Pool class in particular made it relatively simple to jump from the built-in map function, which is a good first step to accelerating loops, to utilizing all cores on a processor without any obscure hoops.


    26 May 2018

  • Generating Different Hash Functions

    Representing genetic sequences using k-mers, or the biological equivalent of n-grams, is a great way to numerically summarize a linear sequence. Depending how unique you need your k-mers to be, you may overallocate your system memory trying to keep track of all 4^k possibilities, where there are 4 possible bases (A, G, C, T) and k-length strings. To circumvent this technological constraint, Bloom filters were designed to probabilisticly track the presence (not count) of items.


    05 Feb 2018

  • Using your own cluster with CyVerse

    The CyVerse SDK currently guides developers through the process of creating apps that run on TACC supercomputers, which requires both a TACC account and an active allocation. For users without an active allocation, they can request to be added to the iPlant-Collabs allocation, which allows developers to prototype CyVerse applications. I stress the word prototype because the iPlant-Collabs allocation is relatively small on purpose to make sure unvetted apps aren’t burning away all the compute time allotted for CyVerse. This methodology is great to generate new apps, the apps needs to be reviewed and published by administrators before they can be run at scale on CyVerse. If a user already has access to a high-performance cluster at their own institution, they can circumvent the constrains of iPlant-Collabs by registering their own executionSystem to the Agave API and running apps on it from the CyVerse Discovery Environment.


    07 Feb 2017