The Broad Institute of MIT and Harvard was launched in 2004 to improve human health by using genomics to advance our understanding of the biology and treatment of human disease, and to help lay the groundwork for a new generation of therapies.
The Hail team's mission is to build tools to enable rapid analysis and exploration of massive genetic datasets (10s of TB and tripling yearly). We are committed to open science and everything we do is open source. We currently develop in Scala, Spark, Python and C/C++ but will use any tools we need to get the job done.
We're looking for skilled engineers who have a solid CS/engineering background, can quickly write clear, correct code and, for the senior position, have experience working on large, complex projects. You don't need experience in biology or our particular technologies. We work in a highly multi-disciplinary environment with biologists, bioinformaticians, doctors, mathematicians, and operations. Self-improvement is a fundamental part of our culture; we want to grow great engineers. You must be excited to be challenged and learn new things.
Like particle physics, astronomy, and tech before, biology has firmly entered the fourth paradigm of data-intensive science in which we measure everything and run computational experiments on the data. Genetic datasets for disease association studies now run in the tens of terabytes, doubling every eight months. RNA-sequencing datasets measuring gene expression at single-cell resolution are measured in gigabytes but doubling far faster in the quest for a Human Cell Atlas. Much of this data comes from the Broad Genomics Platform, the largest producer of human genomics information in the world.
With such staggering advances and investment in high-throughput perturbation and measurement, technical barriers to discovery are rapidly shifting from biological to computational. We believe there is a unique opportunity to transform the practice of computational biology by applying deep ideas from computer science and mathematics to build the next generation of modular, scalable tools for analyzing massive genetic and biological data. These tools will drive the development of new treatments and biotechnologies and fundamentally advance our understanding of life itself.
Two of us (Cotton Seed and Jon Bloom) co-founded the Hail project in the Neale lab in Fall 2015 to help the genetics community harness the flood of sequenced genomes in order to unravel the genetic architecture of disease. Our open-source framework is already being used to analyze the largest genetic data sets in existence, to power dozens of major academic studies, and to meet the exploding needs of hospitals, diagnostic labs, and industry. We're now building a new Initiative in Scalable Analytics in order to grow the Hail project within and beyond genetics and reduce the latency of computational experiments in biomedical research.
Operationally, we're a software team embedded inside the world's leading biomedical and genomics research institute, anchoring the global heart of biotech right across from MIT. We implement distributed algorithms on top of our custom-built language, compiler, and run-time system to support querying, aggregation, and linear algebra on hundreds of thousands of human genomes. We thrive on diverse challenges: language and compiler design, low-level performance optimization, architecture of distributed systems, scaling of established methods and invention of new ones, visualization, interoperability with other powerful tools, and close collaboration with the amazing science and scientists all around us (this is super fun!).
In addition to software development, our team is interested in the application of machine learning to problems in biology. We run the Models, Inference & Algorithms Initiative to foster community and pedagogy in greater Boston at the interface of computational biology, mathematical theory, and computer science.
If you’d like a video intro for engineers on what we’re building and why, check out the talks on our homepage. If you’re curious how Hail is used, check out the tutorials. And if you think you might enjoy applying your engineering skills to accelerate our understanding of disease, send any questions to
email@example.com and apply soon!