All functionality described on this page is experimental and subject to change.
This page describes genetic datasets that are hosted in public buckets on both Google Cloud Storage and Amazon S3. Note that these datasets are stored in Requester Pays buckets on GCS, and are available in both the US and EU regions. On AWS, the datasets are shared via Open Data on AWS and are in buckets in the US region.
Check out the
load_dataset() function to see how to load one of these
datasets into a Hail pipeline. You will need to provide the name, version, and
reference genome build of the desired dataset, as well as specify the region
your cluster is in and the cloud platform. Egress charges may apply if your
cluster is outside of the region specified.
Schemas for Available Datasets
|name||description||version||reference genome||cloud: [regions]|