Methods

Import / Export

export_elasticsearch(t, host, port, index, …) Export a Table to Elasticsearch.
export_gen(dataset, output[, precision, gp, …]) Export a MatrixTable as GEN and SAMPLE files.
export_plink(dataset, output[, call, …]) Export a MatrixTable as PLINK2 BED, BIM and FAM files.
export_vcf(dataset, output[, …]) Export a MatrixTable as a VCF file.
get_vcf_metadata(path) Extract metadata from VCF header.
import_bed(path[, reference_genome, …]) Import a UCSC BED file as a Table.
import_bgen(path, entry_fields[, …]) Import BGEN file(s) as a MatrixTable.
import_fam(path[, quant_pheno, delimiter, …]) Import a PLINK FAM file into a Table.
import_gen(path[, sample_file, tolerance, …]) Import GEN file(s) as a MatrixTable.
import_locus_intervals(path[, …]) Import a locus interval list as a Table.
import_matrix_table(paths[, row_fields, …]) Import tab-delimited file(s) as a MatrixTable.
import_plink(bed, bim, fam[, …]) Import a PLINK dataset (BED, BIM, FAM) as a MatrixTable.
import_table(paths[, key, min_partitions, …]) Import delimited text file (text table) as Table.
import_vcf(path[, force, force_bgz, …]) Import VCF file(s) as a MatrixTable.
index_bgen(path[, index_file_map, …]) Index BGEN files as required by import_bgen().
read_matrix_table(path[, _drop_cols, _drop_rows]) Read in a MatrixTable written with written with MatrixTable.write()
read_table(path) Read in a Table written with Table.write().

Statistics

linear_mixed_model(y, x[, z_t, k, p_path, …]) Initialize a linear mixed model from a matrix table.
linear_mixed_regression(entry_expr, model[, …]) For each row, test an input variable for association using a linear mixed model.
linear_regression(y, x, covariates[, root, …]) For each row, test an input variable for association with response variables using linear regression.
logistic_regression(test, y, x, covariates) For each row, test an input variable for association with a binary response variable using logistic regression.
pca(entry_expr[, k, compute_loadings]) Run principal component analysis (PCA) on numeric columns derived from a matrix table.
poisson_regression(test, y, x, covariates[, …]) For each row, test an input variable for association with a count response variable using Poisson regression.
row_correlation(entry_expr[, block_size]) Computes the correlation matrix between row vectors.

Genetics

balding_nichols_model(n_populations, …[, …]) Generate a matrix table of variants, samples, and genotypes using the Balding-Nichols model.
concordance(left, right) Calculate call concordance with another dataset.
filter_intervals(ds, intervals[, keep]) Filter rows with a list of intervals.
filter_alleles(mt, f) Filter alternate alleles.
filter_alleles_hts(mt, f, subset) Filter alternate alleles and update standard GATK entry fields.
genetic_relatedness_matrix(call_expr) Compute the genetic relatedness matrix (GRM).
hwe_normalized_pca(call_expr[, k, …]) Run principal component analysis (PCA) on the Hardy-Weinberg-normalized genotype call matrix.
identity_by_descent(dataset[, maf, bounded, …]) Compute matrix of identity-by-descent estimates.
impute_sex(call[, aaf_threshold, …]) Impute sex of samples by calculating inbreeding coefficient on the X chromosome.
ld_matrix(entry_expr, locus_expr, radius[, …]) Computes the windowed correlation (linkage disequilibrium) matrix between variants.
ld_prune(call_expr[, r2, bp_window_size, …]) Returns a maximal subset of variants that are nearly uncorrelated within each window.
mendel_errors(call, pedigree) Find Mendel errors; count per variant, individual and nuclear family.
de_novo(mt, pedigree, pop_frequency_prior, …) Call putative de novo events from trio data.
nirvana(dataset, hail.table.Table], config) Annotate variants using Nirvana.
pc_relate(call_expr, min_individual_maf, *) Compute relatedness estimates between individuals using a variant of the PC-Relate method.
realized_relationship_matrix(call_expr) Computes the realized relationship matrix (RRM).
sample_qc(mt[, name]) Compute per-sample metrics useful for quality control.
skat(key_expr, weight_expr, y, x, covariates) Test each keyed group of rows for association by linear or logistic SKAT test.
split_multi(ds[, keep_star, left_aligned]) Split multiallelic variants.
split_multi_hts(ds[, keep_star, …]) Split multiallelic variants for datasets that contain one or more fields from a standard high-throughput sequencing entry schema.
transmission_disequilibrium_test(dataset, …) Performs the transmission disequilibrium test on trios.
trio_matrix(dataset, pedigree[, complete_trios]) Builds and returns a matrix where columns correspond to trios and entries contain genotypes for the trio.
variant_qc(mt[, name]) Compute common variant statistics (quality control metrics).
vep(dataset, hail.matrixtable.MatrixTable], …) Annotate variants with VEP.
window_by_locus(mt, bp_window_size) Collect arrays of row and entry values from preceding loci.

Miscellaneous

grep(regex, path[, max_count]) Searches given paths for all lines containing regex matches.
maximal_independent_set(i, j[, keep, …]) Return a table containing the vertices in a near maximal independent set of an undirected graph whose edges are given by a two-column table.
rename_duplicates(dataset[, name]) Rename duplicate column keys.