Powering genomic analysis, at every scale

An open-source library for scalable genomic data exploration
GWAS with Hail (click to show code)

Install

pip install hail

Hail requires Python 3 and the Java 8 JRE

GNU/Linux wil also need C and C++ standard libraries if not already installed

Detailed instructions

Features

Simplified Analysis

Hail is an open-source Python library that simplifies genomic data analysis. It provides powerful, easy-to-use data science tools that can be used to interrogate even biobank-scale genomic data (e.g. UK Biobank, gnomAD, TopMed, FinnGen, and Biobank Japan).

Genomic Dataframes

Modern data science is driven by numeric matrices (see Numpy) and tables (see R and Pandas). While sufficient for many tasks, none of these tools adequately capture the structure of genetic data. Genetic data combines multiple axes (variants and samples) like matrices and structured entries (genotypes) like tables or dataframes. To support genomic analysis, Hail introduces a powerful, distributed data structure combining features of matrices and dataframes called MatrixTable.

Input Unification

The Hail MatrixTable unifies a wide range of input formats (e.g. vcf, bgen, plink, tsv, gtf, bed files), and supports scalable queries, even on petabyte-size datasets. By leveraging MatrixTable, Hail provides an integrated, scalable analysis platform for science.

Learn More >

Acknowledgments

The Hail team has several sources of funding at the Broad Institute:

  • The Stanley Center for Psychiatric Research, which together with Neale Lab has provided an incredibly supportive and stimulating home.
  • Principal Investigator Benjamin Neale, whose scientific leadership has been essential for solving the right problems.
  • Principal Investigator Daniel MacArthur and the other members of the gnomAD council.
  • Jeremy Wertheimer, whose strategic advice and generous philanthropy have been essential for growing the impact of Hail.

We are grateful for generous support from:

  • The National Institute of Diabetes and Digestive and Kidney Diseases
  • The National Institute of Mental Health
  • The National Human Genome Research Institute

We are grateful for generous past support from:

  • The Chan Zuckerburg Initiative

We would like to thank Zulip for supporting open-source by providing free hosting, and YourKit, LLC for generously providing free licenses for YourKit Java Profiler for open-source development.