Methods¶
Import / Export
export_elasticsearch (t, host, port, index, …) |
Export a Table to Elasticsearch. |
export_gen (dataset, output[, precision, gp, …]) |
Export a MatrixTable as GEN and SAMPLE files. |
export_bgen (mt, output[, gp, varid, rsid, …]) |
Export MatrixTable as MatrixTable as BGEN 1.2 file with 8 bits of per probability. |
export_plink (dataset, output[, call, …]) |
Export a MatrixTable as PLINK2 BED, BIM and FAM files. |
export_vcf (dataset, output[, …]) |
Export a MatrixTable as a VCF file. |
get_vcf_metadata (path) |
Extract metadata from VCF header. |
import_bed (path[, reference_genome, …]) |
Import a UCSC BED file as a Table . |
import_bgen (path, entry_fields[, …]) |
Import BGEN file(s) as a MatrixTable . |
import_fam (path[, quant_pheno, delimiter, …]) |
Import a PLINK FAM file into a Table . |
import_gen (path[, sample_file, tolerance, …]) |
Import GEN file(s) as a MatrixTable . |
import_locus_intervals (path[, …]) |
Import a locus interval list as a Table . |
import_matrix_table (paths[, row_fields, …]) |
Import tab-delimited file(s) as a MatrixTable . |
import_plink (bed, bim, fam[, …]) |
Import a PLINK dataset (BED, BIM, FAM) as a MatrixTable . |
import_table (paths[, key, min_partitions, …]) |
Import delimited text file (text table) as Table . |
import_vcf (path[, force, force_bgz, …]) |
Import VCF file(s) as a MatrixTable . |
import_gvcfs (path, partitions[, force, …]) |
(Experimental) Import multiple vcfs as multiple MatrixTable . |
index_bgen (path[, index_file_map, …]) |
Index BGEN files as required by import_bgen() . |
read_matrix_table (path, *[, _intervals, …]) |
Read in a MatrixTable written with MatrixTable.write() . |
read_table (path, *[, _intervals, …]) |
Read in a Table written with Table.write() . |
Statistics
linear_mixed_model (y, x[, z_t, k, p_path, …]) |
Initialize a linear mixed model from a matrix table. |
linear_mixed_regression_rows (entry_expr, model) |
For each row, test an input variable for association using a linear mixed model. |
linear_regression_rows (y, x, covariates[, …]) |
For each row, test an input variable for association with response variables using linear regression. |
logistic_regression_rows (test, y, x, covariates) |
For each row, test an input variable for association with a binary response variable using logistic regression. |
poisson_regression_rows (test, y, x, covariates) |
For each row, test an input variable for association with a count response variable using Poisson regression. |
pca (entry_expr[, k, compute_loadings]) |
Run principal component analysis (PCA) on numeric columns derived from a matrix table. |
row_correlation (entry_expr[, block_size]) |
Computes the correlation matrix between row vectors. |
Genetics
balding_nichols_model (n_populations, …[, …]) |
Generate a matrix table of variants, samples, and genotypes using the Balding-Nichols or Pritchard-Stephens-Donnelly model. |
concordance (left, right, *[, …]) |
Calculate call concordance with another dataset. |
filter_intervals (ds, intervals[, keep]) |
Filter rows with a list of intervals. |
filter_alleles (mt, f) |
Filter alternate alleles. |
filter_alleles_hts (mt, f, subset) |
Filter alternate alleles and update standard GATK entry fields. |
genetic_relatedness_matrix (call_expr) |
Compute the genetic relatedness matrix (GRM). |
hwe_normalized_pca (call_expr[, k, …]) |
Run principal component analysis (PCA) on the Hardy-Weinberg-normalized genotype call matrix. |
identity_by_descent (dataset[, maf, bounded, …]) |
Compute matrix of identity-by-descent estimates. |
impute_sex (call[, aaf_threshold, …]) |
Impute sex of samples by calculating inbreeding coefficient on the X chromosome. |
ld_matrix (entry_expr, locus_expr, radius[, …]) |
Computes the windowed correlation (linkage disequilibrium) matrix between variants. |
ld_prune (call_expr[, r2, bp_window_size, …]) |
Returns a maximal subset of variants that are nearly uncorrelated within each window. |
mendel_errors (call, pedigree) |
Find Mendel errors; count per variant, individual and nuclear family. |
de_novo (mt, pedigree, pop_frequency_prior, …) |
Call putative de novo events from trio data. |
nirvana (dataset, hail.table.Table], config) |
Annotate variants using Nirvana. |
pc_relate (call_expr, min_individual_maf, *) |
Compute relatedness estimates between individuals using a variant of the PC-Relate method. |
realized_relationship_matrix (call_expr) |
Computes the realized relationship matrix (RRM). |
sample_qc (mt[, name]) |
Compute per-sample metrics useful for quality control. |
skat (key_expr, weight_expr, y, x, covariates) |
Test each keyed group of rows for association by linear or logistic SKAT test. |
lambda_gc (p_value[, approximate]) |
Compute genomic inflation factor (lambda GC) from an Expression of p-values. |
split_multi (ds[, keep_star, left_aligned, …]) |
Split multiallelic variants. |
split_multi_hts (ds[, keep_star, …]) |
Split multiallelic variants for datasets that contain one or more fields from a standard high-throughput sequencing entry schema. |
transmission_disequilibrium_test (dataset, …) |
Performs the transmission disequilibrium test on trios. |
trio_matrix (dataset, pedigree[, complete_trios]) |
Builds and returns a matrix where columns correspond to trios and entries contain genotypes for the trio. |
variant_qc (mt[, name]) |
Compute common variant statistics (quality control metrics). |
vep (dataset, hail.matrixtable.MatrixTable], …) |
Annotate variants with VEP. |
window_by_locus (mt, bp_window_size) |
Collect arrays of row and entry values from preceding loci. |
Miscellaneous
grep (regex, path[, max_count]) |
Searches given paths for all lines containing regex matches. |
maximal_independent_set (i, j[, keep, …]) |
Return a table containing the vertices in a near maximal independent set of an undirected graph whose edges are given by a two-column table. |
rename_duplicates (dataset[, name]) |
Rename duplicate column keys. |