Visit to the Broad Institute

Visit to the Broad Institute

I had an interesting visit to the Broad Institute today, at the invitation of Clare Bernard (who got her Ph.D. in our ATLAS group at BU) and her colleague Christopher Farnham (also with a BU background). I told them all about MGHPCC and NESE. As usual, they were aware of MGHPCC, but only in a general sort of way. Broad is an MIT/Harvard outfit. James Cuff used to work there. They know of James and Steven Litster, have talked with Chris Hill and (I think) John Goodhue as well. Even for someone with no Biology background, I’m aware that Broad people co-invented CRISPR (https://en.wikipedia.org/wiki/CRISPR) and they are one of the very top biotech places in the whole world.

A few items I picked up on…

  • They have a giant sequencing facility a few blocks from where we were in Cambridge. Their sequencing pipelines are routinely processing 25TB per day and 100,000 “samples”. They do most of their computing in the Google Cloud.

  • They are one of the main (perhaps the main) developers of GATK, the Gene Analysis Toolkit which is used by the whole world of biology (60,000 users, they say). They do a lot of work with Intel and Google via their “Methods Team”.

  • Other software that they use: Cromwell, Whittle, Scala. They are interested in the usual AI/Data Science stack, Jupyter Notebooks, Tensor Flow, etc., just like what (for instance), Jeremy Kepner does on the MIT Supercloud. A lot of what the Methods Team does is produce toolkits or full working environments for other researchers to use, sometimes via download, but more often in the sort of web interface style hiding complexities of cloud back ends in the style that Scott often talks about. They share all their developed code freely, but, naturally, there are some things that are sort of internal and they haven’t put the extra effort into turning into supported documented products.

  • Somewhat to my surprise, they need to work quite close to the computing hardware to get what they need. In fact they are starting to use specialized sequencing computing hardware, including some from a company called Illumina. They have a rack installed at 1SS, but may need to expand.

  • They’re interested in following along with the NESE project, so I’m pointing them at our web & social media.

Broad Institute

  • Saul