.. sec-types:

.. testsetup::

    vds = hc.read("data/example.vds").annotate_variants_expr('va.genes = ["ACBD", "DCBA"]')

=====
Types
=====


.. _aggregable:

----------
Aggregable
----------


An ``Aggregable`` is a Hail data type representing a distributed row or column of a matrix. Hail exposes a number of methods to compute on aggregables depending on the data type.

.. _aggregable[array[double]]:

Aggregable[Array[Double]]
=========================

 - **sum()**: *Array[Double]* -- Compute the sum by index. All elements in the aggregable must have the same length.

.. _aggregable[array[float]]:

Aggregable[Array[Float]]
========================

 - **sum()**: *Array[Float]* -- Compute the sum by index. All elements in the aggregable must have the same length.

.. _aggregable[array[int]]:

Aggregable[Array[Int]]
======================

 - **sum()**: *Array[Int]*

	Compute the sum by index. All elements in the aggregable must have the same length.
	
	**Examples**
	
	Count the total number of occurrences of each allele across samples, per variant:
	
	>>> vds_result = vds.annotate_variants_expr('va.AC = gs.map(g => g.oneHotAlleles(v)).sum()')

.. _aggregable[array[long]]:

Aggregable[Array[Long]]
=======================

 - **sum()**: *Array[Long]* -- Compute the sum by index. All elements in the aggregable must have the same length.

.. _aggregable[double]:

Aggregable[Double]
==================

 - **hist(start: Double, end: Double, bins: Int)**: *Struct{binEdges:Array[Double],binFrequencies:Array[Long],nLess:Long,nGreater:Long}*

	.. container:: annotation

		 - **binEdges** (*Array[Double]*) -- Array of bin cutoffs
		 - **binFrequencies** (*Array[Long]*) -- Number of elements that fall in each bin.
		 - **nLess** (*Long*) -- Number of elements less than the minimum bin
		 - **nGreater** (*Long*) -- Number of elements greater than the maximum bin

	Compute frequency distributions of numeric parameters.
	
	**Examples**
	
	Compute GQ-distributions per variant:
	
	>>> vds_result = vds.annotate_variants_expr('va.gqHist = gs.map(g => g.gq).hist(0, 100, 20)')
	
	Compute global GQ-distribution:
	
	>>> gq_hist = vds.query_genotypes('gs.map(g => g.gq).hist(0, 100, 100)')
	
	**Notes**
	
	- The start, end, and bins params are no-scope parameters, which means that while computations like 100 / 4 are acceptable, variable references like ``global.nBins`` are not.
	- Bin size is calculated from (``end`` - ``start``) / ``bins``
	- (``bins`` + 1) breakpoints are generated from the range (start to end by binsize)
	- Each bin is left-inclusive, right-exclusive except the last bin, which includes the maximum value. This means that if there are N total bins, there will be N + 1 elements in ``binEdges``. For the invocation ``hist(0, 3, 3)``, ``binEdges`` would be ``[0, 1, 2, 3]`` where the bins are ``[0, 1), [1, 2), [2, 3]``.

	**Arguments**
	
		 - **start** (*Double*) -- Starting point of first bin
		 - **end** (*Double*) -- End point of last bin
		 - **bins** (*Int*) -- Number of bins to create.
	       

 - **max()**: *Double* -- Compute the maximum of all non-missing elements. The empty max is missing.

 - **min()**: *Double* -- Compute the minimum of all non-missing elements. The empty min is missing.

 - **product()**: *Double* -- Compute the product of all non-missing elements. The empty product is one.

 - **stats()**: *Struct{mean:Double,stdev:Double,min:Double,max:Double,nNotMissing:Long,sum:Double}*

	.. container:: annotation

		 - **mean** (*Double*) -- Mean value
		 - **stdev** (*Double*) -- Standard deviation
		 - **min** (*Double*) -- Minimum value
		 - **max** (*Double*) -- Maximum value
		 - **nNotMissing** (*Long*) -- Number of non-missing values
		 - **sum** (*Double*) -- Sum of all elements

	Compute summary statistics about a numeric aggregable.
	
	**Examples**
	
	Compute the mean genotype quality score per variant:
	
	>>> vds_result = vds.annotate_variants_expr('va.gqMean = gs.map(g => g.gq).stats().mean')
	
	Compute summary statistics on the number of singleton calls per sample:
	
	>>> [singleton_stats] = (vds.sample_qc()
	...     .query_samples(['samples.map(s => sa.qc.nSingleton).stats()']))
	
	Compute GQ and DP statistics stratified by genotype call:
	
	>>> gq_dp = [
	... 'va.homrefGQ = gs.filter(g => g.isHomRef()).map(g => g.gq).stats()',
	... 'va.hetGQ = gs.filter(g => g.isHet()).map(g => g.gq).stats()',
	... 'va.homvarGQ = gs.filter(g => g.isHomVar()).map(g => g.gq).stats()',
	... 'va.homrefDP = gs.filter(g => g.isHomRef()).map(g => g.dp).stats()',
	... 'va.hetDP = gs.filter(g => g.isHet()).map(g => g.dp).stats()',
	... 'va.homvarDP = gs.filter(g => g.isHomVar()).map(g => g.dp).stats()']
	>>> vds_result = vds.annotate_variants_expr(gq_dp)
	
	**Notes**
	
	The ``stats()`` aggregator can be used to replicate some of the values computed by :py:meth:`~hail.VariantDataset.variant_qc` and :py:meth:`~hail.VariantDataset.sample_qc` such as ``dpMean`` and ``dpStDev``.

 - **sum()**: *Double* -- Compute the sum of all non-missing elements. The empty sum is zero.

.. _aggregable[float]:

Aggregable[Float]
=================

 - **max()**: *Float* -- Compute the maximum of all non-missing elements. The empty max is missing.

 - **min()**: *Float* -- Compute the minimum of all non-missing elements. The empty min is missing.

 - **sum()**: *Float* -- Compute the sum of all non-missing elements. The empty sum is zero.

.. _aggregable[genotype]:

Aggregable[Genotype]
====================

 - **callStats(f: Genotype => Variant)**: *Struct{AC:Array[Int],AF:Array[Double],AN:Int,GC:Array[Int]}*

	.. container:: annotation

		 - **AC** (*Array[Int]*) -- Allele count. One element per allele **including reference**. There are two elements for a biallelic variant, or 4 for a variant with three alternate alleles.
		 - **AF** (*Array[Double]*) -- Allele frequency. One element per allele including reference. Sums to 1.
		 - **AN** (*Int*) -- Allele number. This is equal to the sum of AC, or 2 * the total number of called genotypes in the aggregable.
		 - **GC** (*Array[Int]*) -- Genotype count. One element per possible genotype, including reference genotypes -- 3 for biallelic, 6 for triallelic, 10 for 3 alt alleles, and so on. The sum of this array is the number of called genotypes in the aggregable.

	Compute four commonly-used metrics over a set of genotypes in a variant.
	
	**Examples**
	
	Compute phenotype-specific call statistics:
	
	>>> pheno_stats = [
	...   'va.case_stats = gs.filter(g => sa.pheno.isCase).callStats(g => v)',
	...   'va.control_stats = gs.filter(g => !sa.pheno.isCase).callStats(g => v)']
	>>> vds_result = vds.annotate_variants_expr(pheno_stats)
	
	``va.eur_stats.AC`` will be the allele count (AC) computed from individuals marked as "EUR".

	**Arguments**
	
		 - **f** (*Genotype => Variant*) -- Variant lambda expression such as ``g => v``.
	       

 - **hardyWeinberg()**: *Struct{rExpectedHetFrequency:Double,pHWE:Double}*

	.. container:: annotation

		 - **rExpectedHetFrequency** (*Double*) -- Expected rHeterozygosity based on Hardy Weinberg Equilibrium
		 - **pHWE** (*Double*) -- p-value

	Compute Hardy-Weinberg equilibrium p-value.
	
	**Examples**
	
	Add a new variant annotation that calculates HWE p-value by phenotype:
	
	>>> vds_result = vds.annotate_variants_expr([
	...   'va.hweCase = gs.filter(g => sa.pheno.isCase).hardyWeinberg()',
	...   'va.hweControl = gs.filter(g => !sa.pheno.isCase).hardyWeinberg()'])
	
	**Notes**
	
	Hail computes the exact p-value with mid-p-value correction, i.e. the probability of a less-likely outcome plus one-half the probability of an equally-likely outcome. See this `document <LeveneHaldane.pdf>`__ for details on the Levene-Haldane distribution and references.

 - **inbreeding(af: Genotype => Double)**: *Struct{Fstat:Double,nTotal:Long,nCalled:Long,expectedHoms:Double,observedHoms:Long}*

	.. container:: annotation

		 - **Fstat** (*Double*) -- Inbreeding coefficient
		 - **nTotal** (*Long*) -- Number of genotypes analyzed
		 - **nCalled** (*Long*) -- number of genotypes with non-missing calls
		 - **expectedHoms** (*Double*) -- Expected number of homozygote calls
		 - **observedHoms** (*Long*) -- Total number of homozygote calls observed

	Compute inbreeding metric. This aggregator is equivalent to the `\`--het\` method in PLINK <https://www.cog-genomics.org/plink2/basic_stats#ibc>`_.
	
	**Examples**
	
	Calculate the inbreeding metric per sample:
	
	>>> vds_result = (vds.variant_qc()
	...     .annotate_samples_expr('sa.inbreeding = gs.inbreeding(g => va.qc.AF)'))
	
	To obtain the same answer as `PLINK <https://www.cog-genomics.org/plink2>`_, use the following series of commands:
	
	>>> vds_result = (vds.variant_qc()
	...     .filter_variants_expr('va.qc.AC > 1 && va.qc.AF >= 1e-8 && va.qc.nCalled * 2 - va.qc.AC > 1 && va.qc.AF <= 1 - 1e-8 && v.isAutosomal()')
	...     .annotate_samples_expr('sa.inbreeding = gs.inbreeding(g => va.qc.AF)'))
	
	**Notes**
	
	The Inbreeding Coefficient (F) is computed as follows:
	
	#. For each variant and sample with a non-missing genotype call, ``E``, the expected number of homozygotes (computed from user-defined expression for minor allele frequency), is computed as ``1.0 - (2.0*maf*(1.0-maf))``
	#. For each variant and sample with a non-missing genotype call, ``O``, the observed number of homozygotes, is computed as ``0 = heterozygote; 1 = homozygote``
	#. For each variant and sample with a non-missing genotype call, ``N`` is incremented by 1
	#. For each sample, ``E``, ``O``, and ``N`` are combined across variants
	#. ``F`` is calculated by ``(O - E) / (N - E)``

	**Arguments**
	
		 - **af** (*Genotype => Double*) -- Lambda expression for the alternate allele frequency.
	       

 - **infoScore()**: *Struct{score:Double,nIncluded:Int}*

	.. container:: annotation

		 - **score** (*Double*) -- IMPUTE info score
		 - **nIncluded** (*Int*) -- Number of samples with non-missing genotype probability distribution

	Compute the IMPUTE information score.
	
	**Examples**
	
	Calculate the info score per variant:
	
	>>> (hc.import_gen("data/example.gen", "data/example.sample")
	...    .annotate_variants_expr('va.infoScore = gs.infoScore()'))
	
	Calculate group-specific info scores per variant:
	
	>>> vds_result = (hc.import_gen("data/example.gen", "data/example.sample")
	...    .annotate_samples_expr("sa.isCase = pcoin(0.5)")
	...    .annotate_variants_expr(["va.infoScore.case = gs.filter(g => sa.isCase).infoScore()",
	...                             "va.infoScore.control = gs.filter(g => !sa.isCase).infoScore()"]))
	
	**Notes**
	
	We implemented the IMPUTE info measure as described in the `supplementary information from Marchini & Howie. Genotype imputation for genome-wide association studies. Nature Reviews Genetics (2010) <http://www.nature.com/nrg/journal/v11/n7/extref/nrg2796-s3.pdf>`_.
	
	To calculate the info score :math:`I_{A}` for one SNP:
	
	.. math::
	
	    I_{A} =
	    \begin{cases}
	    1 - \frac{\sum_{i=1}^{N}(f_{i} - e_{i}^2)}{2N\hat{\theta}(1 - \hat{\theta})} & \text{when } \hat{\theta} \in (0, 1) \\
	    1 & \text{when } \hat{\theta} = 0, \hat{\theta} = 1\\
	    \end{cases}
	
	- :math:`N` is the number of samples with imputed genotype probabilities [:math:`p_{ik} = P(G_{i} = k)` where :math:`k \in \{0, 1, 2\}`]
	- :math:`e_{i} = p_{i1} + 2p_{i2}` is the expected genotype per sample
	- :math:`f_{i} = p_{i1} + 4p_{i2}`
	- :math:`\hat{\theta} = \frac{\sum_{i=1}^{N}e_{i}}{2N}` is the MLE for the population minor allele frequency
	
	
	Hail will not generate identical results as `QCTOOL <http://www.well.ox.ac.uk/~gav/qctool/#overview>`_ for the following reasons:
	
	  - The floating point number Hail stores for each genotype probability is slightly different than the original data due to rounding and normalization of probabilities.
	  - Hail automatically removes genotype probability distributions that do not meet certain requirements on data import with :py:meth:`~hail.HailContext.import_gen` and :py:meth:`~hail.HailContext.import_bgen`.
	  - Hail does not use the population frequency to impute genotype probabilities when a genotype probability distribution has been set to missing.
	  - Hail calculates the same statistic for sex chromosomes as autosomes while QCTOOL incorporates sex information
	
	.. warning::
	
	    - The info score Hail reports will be extremely different from qctool when a SNP has a high missing rate.
	    - If the genotype data was not imported using the :py:meth:`~hail.HailContext.import_gen` or :py:meth:`~hail.HailContext.import_bgen` commands, then the results for all variants will be ``score = NA`` and ``nIncluded = 0``.
	    - It only makes sense to compute the info score for an Aggregable[Genotype] per variant. While a per-sample info score will run, the result is meaningless.

.. _aggregable[int]:

Aggregable[Int]
===============

 - **max()**: *Int* -- Compute the maximum of all non-missing elements. The empty max is missing.

 - **min()**: *Int* -- Compute the minimum of all non-missing elements. The empty min is missing.

 - **sum()**: *Int* -- Compute the sum of all non-missing elements. The empty sum is zero.

.. _aggregable[long]:

Aggregable[Long]
================

 - **max()**: *Long* -- Compute the maximum of all non-missing elements. The empty max is missing.

 - **min()**: *Long* -- Compute the minimum of all non-missing elements. The empty min is missing.

 - **product()**: *Long* -- Compute the product of all non-missing elements. The empty product is one.

 - **sum()**: *Long* -- Compute the sum of all non-missing elements. The empty sum is zero.

.. _aggregable[t]:

Aggregable[T]
=============

 - **collect()**: *Array[T]*

	Returns an array with all of the elements in the aggregable. Order is not guaranteed.
	
	**Examples**
	
	Collect the list of sample IDs with heterozygote genotype calls per variant:
	
	>>> vds_result = vds.annotate_variants_expr('va.hetSamples = gs.filter(g => g.isHet()).map(g => s).collect()')
	
	``va.hetSamples`` will have the type ``Array[String]``.

 - **collectAsSet()**: *Set[T]* -- Returns the set of all unique elements in the aggregable.

 - **count()**: *Long*

	Counts the number of elements in an aggregable.
	
	**Examples**
	
	Count the number of heterozygote genotype calls in an aggregable of genotypes (``gs``):
	
	>>> vds_result = vds.annotate_variants_expr('va.nHets = gs.filter(g => g.isHet()).count()')

 - **counter()**: *Dict[T, Long]*

	Counts the number of occurrences of each element in an aggregable.
	
	**Examples**
	
	Compute the number of indels in each chromosome:
	
	>>> [indels_per_chr] = vds.query_variants(['variants.filter(v => v.altAllele().isIndel()).map(v => v.contig).counter()'])
	
	**Notes**
	
	We recommend this function is used with the `Python counter object <https://docs.python.org/2/library/collections.html#collections.Counter>`_.
	
	>>> [counter] = vds.query_variants(['variants.flatMap(v => v.altAlleles).counter()'])
	>>> from collections import Counter
	>>> counter = Counter(counter)
	>>> print(counter.most_common(5))
	[(AltAllele(C, T), 129L),
	 (AltAllele(G, A), 112L),
	 (AltAllele(C, A), 60L),
	 (AltAllele(A, G), 46L),
	 (AltAllele(T, C), 44L)]

 - **exists(expr: T => Boolean)**: *Boolean*

	Returns true if **any** element in the aggregator satisfies the condition given by ``expr`` and false otherwise.

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **filter(f: T => Boolean)**: *Aggregable[T]*

	Subsets an aggregable by evaluating ``f`` for each element and keeping those elements that evaluate to true.
	
	**Examples**
	
	Compute Hardy Weinberg Equilibrium for cases only:
	
	>>> vds_result = vds.annotate_variants_expr("va.hweCase = gs.filter(g => sa.isCase).hardyWeinberg()")

	**Arguments**
	
		 - **f** (*T => Boolean*) -- Boolean lambda expression.
	       

 - **flatMap(f: T => Set[U])**: *Aggregable[U]*

	Returns a new aggregable by applying a function ``f`` to each element and concatenating the resulting sets.
	
	Compute a list of genes per sample with loss of function variants (result does not have duplicate entries):
	
	>>> vds_result = vds.annotate_samples_expr('sa.lof_genes = gs.filter(g => va.consequence == "LOF" && g.nNonRefAlleles() > 0).flatMap(g => va.genes.toSet()).collect()')

	**Arguments**
	
		 - **f** (*T => Set[U]*) -- Lambda expression.
	       

 - **flatMap(a: T => Array[U])**: *Aggregable[U]*

	Returns a new aggregable by applying a function ``f`` to each element and concatenating the resulting arrays.
	
	**Examples**
	
	Compute a list of genes per sample with loss of function variants (result may have duplicate entries):
	
	>>> vds_result = vds.annotate_samples_expr('sa.lof_genes = gs.filter(g => va.consequence == "LOF" && g.nNonRefAlleles() > 0).flatMap(g => va.genes).collect()')

 - **forall(expr: T => Boolean)**: *Boolean*

	Returns a true if **all** elements in the array satisfies the condition given by ``expr`` and false otherwise.

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **fraction(a: T => Boolean)**: *Double*

	Computes the ratio of the number of occurrences for which a boolean condition evaluates to true, divided by the number of included elements in the aggregable.
	
	**Examples**
	
	Filter variants with a call rate less than 95%:
	
	>>> vds_result = vds.filter_variants_expr('gs.fraction(g => g.isCalled()) > 0.90')
	
	Compute the differential missingness at SNPs and indels:
	
	>>> exprs = ['sa.SNPmissingness = gs.filter(g => v.altAllele().isSNP()).fraction(g => g.isNotCalled())',
	...          'sa.indelmissingness = gs.filter(g => v.altAllele().isIndel()).fraction(g => g.isNotCalled())']
	>>> vds_result = vds.annotate_samples_expr(exprs)

 - **map(f: T => U)**: *Aggregable[U]*

	Change the type of an aggregable by evaluating ``f`` for each element.
	
	**Examples**
	
	Convert an aggregable of genotypes (``gs``) to an aggregable of genotype quality scores and then compute summary statistics:
	
	>>> vds_result = vds.annotate_variants_expr("va.gqStats = gs.map(g => g.gq).stats()")

	**Arguments**
	
		 - **f** (*T => U*) -- Lambda expression.
	       

 - **take(n: Int)**: *Array[T]*

	Take the first ``n`` items of an aggregable.
	
	**Examples**
	
	Collect the first 5 sample IDs with at least one alternate allele per variant:
	
	>>> vds_result = vds.annotate_variants_expr("va.nonRefSamples = gs.filter(g => g.nNonRefAlleles() > 0).map(g => s).take(5)")

	**Arguments**
	
		 - **n** (*Int*) -- Number of items to take.
	       

 - **takeBy(f: T => String, n: Int)**: *Array[T]*

	Returns the first ``n`` items of an aggregable in ascending order, ordered
	by the result of ``f``. ``NA`` always appears last. If the aggregable
	contains less than ``n`` items, then the result will contain as many
	elements as the aggregable contains.

	**Arguments**
	
		 - **f** (*T => String*) -- Lambda expression for mapping an aggregable to an ordered value.
		 - **n** (*Int*) -- Number of items to take.
	       

 - **takeBy(f: T => Double, n: Int)**: *Array[T]*

	Returns the first ``n`` items of an aggregable in ascending order, ordered
	by the result of ``f``. ``NA`` always appears last. If the aggregable
	contains less than ``n`` items, then the result will contain as many
	elements as the aggregable contains.
	
	
	Note that ``NaN`` always appears after any finite or infinite floating-point
	numbers but before ``NA``. For example, consider an aggregable containing
	these elements::
	
	  Infinity, -1, 1, 0, -Infinity, NA, NaN
	
	The expression ``gs.takeBy(x => x, 7)`` would return the array::
	
	  [-Infinity, -1, 0, 1, Infinity, NaN, NA]
	
	The expression ``gs.takeBy(x => -x, 7)`` would return the array::
	
	  [Infinity, 1, 0, -1, -Infinity, NaN, NA]

	**Arguments**
	
		 - **f** (*T => Double*) -- Lambda expression for mapping an aggregable to an ordered value.
		 - **n** (*Int*) -- Number of items to take.
	       

 - **takeBy(f: T => Float, n: Int)**: *Array[T]*

	Returns the first ``n`` items of an aggregable in ascending order, ordered
	by the result of ``f``. ``NA`` always appears last. If the aggregable
	contains less than ``n`` items, then the result will contain as many
	elements as the aggregable contains.
	
	
	Note that ``NaN`` always appears after any finite or infinite floating-point
	numbers but before ``NA``. For example, consider an aggregable containing
	these elements::
	
	  Infinity, -1, 1, 0, -Infinity, NA, NaN
	
	The expression ``gs.takeBy(x => x, 7)`` would return the array::
	
	  [-Infinity, -1, 0, 1, Infinity, NaN, NA]
	
	The expression ``gs.takeBy(x => -x, 7)`` would return the array::
	
	  [Infinity, 1, 0, -1, -Infinity, NaN, NA]

	**Arguments**
	
		 - **f** (*T => Float*) -- Lambda expression for mapping an aggregable to an ordered value.
		 - **n** (*Int*) -- Number of items to take.
	       

 - **takeBy(f: T => Long, n: Int)**: *Array[T]*

	Returns the first ``n`` items of an aggregable in ascending order, ordered
	by the result of ``f``. ``NA`` always appears last. If the aggregable
	contains less than ``n`` items, then the result will contain as many
	elements as the aggregable contains.
	
	
	**Examples**
	
	Consider an aggregable ``gs`` containing these elements::
	
	  7, 6, 3, NA, 1, 2, NA, 4, 5, -1
	
	The expression ``gs.takeBy(x => x, 5)`` would return the array::
	
	  [-1, 1, 2, 3, 4]
	
	The expression ``gs.takeBy(x => -x, 5)`` would return the array::
	
	  [7, 6, 5, 4, 3]
	
	The expression ``gs.takeBy(x => x, 10)`` would return the array::
	
	  [-1, 1, 2, 3, 4, 5, 6, 7, NA, NA]
	
	Returns the 10 samples with the least number of singletons:
	
	>>> samplesMostSingletons = (vds
	...   .sample_qc()
	...   .query_samples('samples.takeBy(s => sa.qc.nSingleton, 10)'))

	**Arguments**
	
		 - **f** (*T => Long*) -- Lambda expression for mapping an aggregable to an ordered value.
		 - **n** (*Int*) -- Number of items to take.
	       

 - **takeBy(f: T => Int, n: Int)**: *Array[T]*

	Returns the first ``n`` items of an aggregable in ascending order, ordered
	by the result of ``f``. ``NA`` always appears last. If the aggregable
	contains less than ``n`` items, then the result will contain as many
	elements as the aggregable contains.
	
	
	**Examples**
	
	Consider an aggregable ``gs`` containing these elements::
	
	  7, 6, 3, NA, 1, 2, NA, 4, 5, -1
	
	The expression ``gs.takeBy(x => x, 5)`` would return the array::
	
	  [-1, 1, 2, 3, 4]
	
	The expression ``gs.takeBy(x => -x, 5)`` would return the array::
	
	  [7, 6, 5, 4, 3]
	
	The expression ``gs.takeBy(x => x, 10)`` would return the array::
	
	  [-1, 1, 2, 3, 4, 5, 6, 7, NA, NA]
	
	Returns the 10 samples with the least number of singletons:
	
	>>> samplesMostSingletons = (vds
	...   .sample_qc()
	...   .query_samples('samples.takeBy(s => sa.qc.nSingleton, 10)'))

	**Arguments**
	
		 - **f** (*T => Int*) -- Lambda expression for mapping an aggregable to an ordered value.
		 - **n** (*Int*) -- Number of items to take.
	       

.. _altallele:

---------
AltAllele
---------


An ``AltAllele`` is a Hail data type representing an alternate allele in the Variant Dataset.

 - **alt**: *String* -- Alternate allele base sequence.

 - **category()**: *String* -- the alt allele type, i.e one of SNP, Insertion, Deletion, Star, MNP, Complex

 - **isComplex()**: *Boolean* -- True if not a SNP, MNP, star, insertion, or deletion.

 - **isDeletion()**: *Boolean* -- True if ``v.ref`` begins with and is longer than ``v.alt``.

 - **isIndel()**: *Boolean* -- True if an insertion or a deletion.

 - **isInsertion()**: *Boolean* -- True if ``v.alt`` begins with and is longer than ``v.ref``.

 - **isMNP()**: *Boolean* -- True if ``v.ref`` and ``v.alt`` are the same length and differ in more than one position.

 - **isSNP()**: *Boolean* -- True if ``v.ref`` and ``v.alt`` are the same length and differ in one position.

 - **isStar()**: *Boolean* -- True if ``v.alt`` is ``*``.

 - **isTransition()**: *Boolean* -- True if a purine-purine or pyrimidine-pyrimidine SNP.

 - **isTransversion()**: *Boolean* -- True if a purine-pyrimidine SNP.

 - **ref**: *String* -- Reference allele base sequence.

.. _array:

-----
Array
-----


An ``Array`` is a collection of items that all have the same data type (ex: Int, String) and are indexed. Arrays can be constructed by specifying ``[item1, item2, ...]`` and they are 0-indexed.

An example of constructing an array and accessing an element is:

.. code-block:: text
    :emphasize-lines: 2

    let a = [1, 10, 3, 7] in a[1]
    result: 10

They can also be nested such as Array[Array[Int]]:

.. code-block:: text
    :emphasize-lines: 2

    let a = [[1, 2, 3], [4, 5], [], [6, 7]] in a[1]
    result: [4, 5]

.. _array[array[t]]:

Array[Array[T]]
===============

 - **flatten()**: *Array[T]*

	Flattens a nested array by concatenating all its rows into a single array.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [[1, 3], [2, 4]] in a.flatten()
	    result: [1, 3, 2, 4]

.. _array[boolean]:

Array[Boolean]
==============

 - **sort(ascending: Boolean)**: *Array[Boolean]*

	Sort the collection with the ordering specified by ``ascending``.

	**Arguments**
	
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sort()**: *Array[Boolean]* -- Sort the collection in ascending order.

.. _array[double]:

Array[Double]
=============

 - **max()**: *Double* -- Largest element in the collection.

 - **mean()**: *Double* -- Mean value of the collection.

 - **median()**: *Double* -- Median value of the collection.

 - **min()**: *Double* -- Smallest element in the collection.

 - **product()**: *Double* -- Product of all elements in the collection (returns 1 if empty).

 - **sort(ascending: Boolean)**: *Array[Double]*

	Sort the collection with the ordering specified by ``ascending``.

	**Arguments**
	
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sort()**: *Array[Double]* -- Sort the collection in ascending order.

 - **sum()**: *Double* -- Sum of all elements in the collection.

.. _array[float]:

Array[Float]
============

 - **max()**: *Float* -- Largest element in the collection.

 - **mean()**: *Double* -- Mean value of the collection.

 - **median()**: *Float* -- Median value of the collection.

 - **min()**: *Float* -- Smallest element in the collection.

 - **product()**: *Float* -- Product of all elements in the collection (returns 1 if empty).

 - **sort(ascending: Boolean)**: *Array[Float]*

	Sort the collection with the ordering specified by ``ascending``.

	**Arguments**
	
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sort()**: *Array[Float]* -- Sort the collection in ascending order.

 - **sum()**: *Float* -- Sum of all elements in the collection.

.. _array[int]:

Array[Int]
==========

 - **max()**: *Int* -- Largest element in the collection.

 - **mean()**: *Double* -- Mean value of the collection.

 - **median()**: *Int* -- Median value of the collection.

 - **min()**: *Int* -- Smallest element in the collection.

 - **product()**: *Int* -- Product of all elements in the collection (returns 1 if empty).

 - **sort(ascending: Boolean)**: *Array[Int]*

	Sort the collection with the ordering specified by ``ascending``.

	**Arguments**
	
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sort()**: *Array[Int]* -- Sort the collection in ascending order.

 - **sum()**: *Int* -- Sum of all elements in the collection.

.. _array[long]:

Array[Long]
===========

 - **max()**: *Long* -- Largest element in the collection.

 - **mean()**: *Double* -- Mean value of the collection.

 - **median()**: *Long* -- Median value of the collection.

 - **min()**: *Long* -- Smallest element in the collection.

 - **product()**: *Long* -- Product of all elements in the collection (returns 1 if empty).

 - **sort(ascending: Boolean)**: *Array[Long]*

	Sort the collection with the ordering specified by ``ascending``.

	**Arguments**
	
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sort()**: *Array[Long]* -- Sort the collection in ascending order.

 - **sum()**: *Long* -- Sum of all elements in the collection.

.. _array[string]:

Array[String]
=============

 - **mkString(delimiter: String)**: *String*

	Concatenates all elements of this array into a single string where each element is separated by the ``delimiter``.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = ["a", "b", "c"] in a.mkString("::")
	    result: "a::b::c"

	**Arguments**
	
		 - **delimiter** (*String*) -- String that separates each element.
	       

 - **sort(ascending: Boolean)**: *Array[String]*

	Sort the collection with the ordering specified by ``ascending``.

	**Arguments**
	
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sort()**: *Array[String]* -- Sort the collection in ascending order.

.. _array[t]:

Array[T]
========

 - **[:\j]**: *Array[T]*

	
	Returns a slice of the array from the first element until the j*th* element
	(0-indexed). Negative indices are interpreted as offsets from the end
	of the array.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6, 8, 10] in a[:4]
	    result: [0, 2, 4, 6]
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6, 8, 10] in a[:-4]
	    result: [0, 2]

	**Arguments**
	
		 - **j** (*Int*) -- End index of the slice (not included in result).
	       

 - **[:]**: *Array[T]*

	Returns a copy of the array.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6] in a[:]
	    result: [0, 2, 4, 6]

 - **[\i:\j]**: *Array[T]*

	Returns a slice of the array from the i*th* element until the j*th* element
	(both 0-indexed). Negative indices are interpreted as offsets from the end
	of the array.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6, 8, 10] in a[2:4]
	    result: [4, 6]
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6, 8, 10] in a[-3:-1]
	    result: [6, 8]
	
	A handy way to understand the behavior of negative indicies is to recall
	this rule:
	
	.. code-block:: text
	
	    s[-i:-j] == s[s.length - i, s.length - j]

	**Arguments**
	
		 - **i** (*Int*) -- Starting index of the slice.
		 - **j** (*Int*) -- End index of the slice (not included in result).
	       

 - **[\i:]**: *Array[T]*

	Returns a slice of the array from the i*th* element (0-indexed) to the
	end. Negative indices are interpreted as offsets from the end of the array.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6, 8, 10] in a[3:]
	    result: [6, 8, 10]
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6, 8, 10] in a[-5:]
	    result: [2, 4, 6, 8, 10]

	**Arguments**
	
		 - **i** (*Int*) -- Starting index of the slice.
	       

 - **[i]**: *T*

	Returns the i*th* element (0-indexed) of the array, or throws an exception if ``i`` is an invalid index.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6, 8, 10] in a[2]
	    result: 4

	**Arguments**
	
		 - **i** (*Int*) -- Index of the element to return.
	       

 - **append(a: T)**: *Array[T]* -- 	Returns the result of adding the element `a` to the end of this Array.

 - **exists(expr: T => Boolean)**: *Boolean*

	Returns a boolean which is true if **any** element in the array satisfies the condition given by ``expr``. false otherwise.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [1, 2, 3, 4, 5, 6] in a.exists(e => e > 4)
	    result: true

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **extend(a: Array[T])**: *Array[T]* -- 	Returns the concatenation of this Array followed by Array `a`.

 - **filter(expr: T => Boolean)**: *Array[T]*

	Returns a new array subsetted to the elements where ``expr`` evaluates to true.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [1, 4, 5, 6, 10] in a.filter(e => e % 2 == 0)
	    result: [4, 6, 10]

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **find(expr: T => Boolean)**: *T*

	Returns the first non-missing element of the array for which expr is true. If no element satisfies the predicate, find returns NA.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = ["cat", "dog", "rabbit"] in a.find(e => 'bb' ~ e)
	    result: "rabbit"

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **flatMap(expr: T => Array[U])**: *Array[U]*

	Returns a new array by applying a function to each subarray and concatenating the resulting arrays.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [[1, 2, 3], [4, 5], [6]] in a.flatMap(e => e + 1)
	    result: [2, 3, 4, 5, 6, 7]

	**Arguments**
	
		 - **expr** (*T => Array[U]*) -- Lambda expression.
	       

 - **forall(expr: T => Boolean)**: *Boolean*

	Returns a boolean which is true if **all** elements in the array satisfies the condition given by ``expr`` and false otherwise.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 2, 4, 6, 8, 10] in a.forall(e => e % 2 == 0)
	    result: true

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **groupBy(a: T => U)**: *Dict[U, Array[T]]*

 - **head()**: *T* -- Selects the first element.

 - **isEmpty()**: *Boolean* -- Returns true if the number of elements is equal to 0. false otherwise.

 - **length()**: *Int* -- Number of elements in the collection.

 - **map(expr: T => U)**: *Array[U]*

	Returns a new array produced by applying ``expr`` to each element.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = [0, 1, 2, 3] in a.map(e => pow(2, e))
	    result: [1, 2, 4, 8]

	**Arguments**
	
		 - **expr** (*T => U*) -- Lambda expression.
	       

 - **size()**: *Int* -- Number of elements in the collection.

 - **sortBy(f: T => String, ascending: Boolean)**: *Array[T]*

	Sort the collection with the ordering specified by ``ascending`` after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => String*) -- Lambda expression.
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sortBy(f: T => String)**: *Array[T]*

	Sort the collection in ascending order after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => String*) -- Lambda expression.
	       

 - **sortBy(f: T => Double, ascending: Boolean)**: *Array[T]*

	Sort the collection with the ordering specified by ``ascending`` after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Double*) -- Lambda expression.
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sortBy(f: T => Double)**: *Array[T]*

	Sort the collection in ascending order after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Double*) -- Lambda expression.
	       

 - **sortBy(f: T => Float, ascending: Boolean)**: *Array[T]*

	Sort the collection with the ordering specified by ``ascending`` after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Float*) -- Lambda expression.
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sortBy(f: T => Float)**: *Array[T]*

	Sort the collection in ascending order after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Float*) -- Lambda expression.
	       

 - **sortBy(f: T => Long, ascending: Boolean)**: *Array[T]*

	Sort the collection with the ordering specified by ``ascending`` after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Long*) -- Lambda expression.
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sortBy(f: T => Long)**: *Array[T]*

	Sort the collection in ascending order after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Long*) -- Lambda expression.
	       

 - **sortBy(f: T => Int, ascending: Boolean)**: *Array[T]*

	Sort the collection with the ordering specified by ``ascending`` after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Int*) -- Lambda expression.
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sortBy(f: T => Int)**: *Array[T]*

	Sort the collection in ascending order after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Int*) -- Lambda expression.
	       

 - **sortBy(f: T => Boolean, ascending: Boolean)**: *Array[T]*

	Sort the collection with the ordering specified by ``ascending`` after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Boolean*) -- Lambda expression.
		 - **ascending** (*Boolean*) -- If true, sort the collection in ascending order. Otherwise, sort in descending order.
	       

 - **sortBy(f: T => Boolean)**: *Array[T]*

	Sort the collection in ascending order after evaluating ``f`` for each element.

	**Arguments**
	
		 - **f** (*T => Boolean*) -- Lambda expression.
	       

 - **tail()**: *Array[T]* -- Selects all elements except the first.

 - **toArray()**: *Array[T]* -- Convert collection to an Array.

 - **toSet()**: *Set[T]* -- Convert collection to a Set.

.. _boolean:

-------
Boolean
-------


 - **max(a: Boolean)**: *Boolean* -- 	Returns the maximum value.

 - **min(a: Boolean)**: *Boolean* -- 	Returns the minimum value.

 - **toDouble()**: *Double* -- Convert value to a Double. Returns 1.0 if true, else 0.0.

 - **toFloat()**: *Float* -- Convert value to a Float. Returns 1.0 if true, else 0.0.

 - **toInt()**: *Int* -- Convert value to an Integer. Returns 1 if true, else 0.

 - **toLong()**: *Long* -- Convert value to a Long. Returns 1L if true, else 0L.

.. _call:

----
Call
----


A ``Call`` is a Hail data type representing a genotype call (ex: 0/0) in the Variant Dataset.

 - **gt**: *Int* -- the integer ``gt = k*(k+1)/2 + j`` for call ``j/k`` (0 = 0/0, 1 = 0/1, 2 = 1/1, 3 = 0/2, etc.).

 - **gtj()**: *Int* -- the index of allele ``j`` for call ``j/k`` (0 = ref, 1 = first alt allele, etc.).

 - **gtk()**: *Int* -- the index of allele ``k`` for call ``j/k`` (0 = ref, 1 = first alt allele, etc.).

 - **isCalled()**: *Boolean* -- True if the call is not ``./.``.

 - **isCalledNonRef()**: *Boolean* -- True if either ``isHet`` or ``isHomVar`` is true.

 - **isHet()**: *Boolean* -- True if this call is heterozygous.

 - **isHetNonRef()**: *Boolean* -- True if this call is ``j/k`` with ``j>0``.

 - **isHetRef()**: *Boolean* -- True if this call is ``0/k`` with ``k>0``.

 - **isHomRef()**: *Boolean* -- True if this call is ``0/0``.

 - **isHomVar()**: *Boolean* -- True if this call is ``j/j`` with ``j>0``.

 - **isNotCalled()**: *Boolean* -- True if the call is ``./.``.

 - **nNonRefAlleles()**: *Int* -- the number of called alternate alleles.

 - **oneHotAlleles(v: Variant)**: *Array[Int]*

	Produce an array of called counts for each allele in the variant (including reference). For example, calling this function with a biallelic variant on hom-ref, het, and hom-var calls will produce ``[2, 0]``, ``[1, 1]``, and ``[0, 2]`` respectively.

	**Arguments**
	
		 - **v** (*Variant*) -- :ref:`variant`
	       

 - **oneHotGenotype(v: Variant)**: *Array[Int]*

	Produces an array with one element for each possible genotype in the variant, where the called genotype is 1 and all else 0. For example, calling this function with a biallelic variant on hom-ref, het, and hom-var calls will produce ``[1, 0, 0]``, ``[0, 1, 0]``, and ``[0, 0, 1]`` respectively.

	**Arguments**
	
		 - **v** (*Variant*) -- :ref:`variant`
	       

 - **toGenotype()**: *Genotype* -- Convert this call to a Genotype.

.. _dict:

----
Dict
----


A ``Dict`` is an unordered collection of key-value pairs. Each key can only appear once in the collection.

 - **[k]**: *U*

	Returns the value for ``k``, or throws an exception if the key is not found.

	**Arguments**
	
		 - **k** (*T*) -- Key in the Dict to query.
	       

 - **contains(k: T)**: *Boolean*

	Returns true if the Dict has a key equal to ``k``, otherwise false.

	**Arguments**
	
		 - **k** (*T*) -- Key name to query.
	       

 - **get(a: T)**: *U* -- 	Returns the value of the Dict for key ``k``, or returns ``NA`` if the key is not found.

 - **isEmpty()**: *Boolean* -- Returns true if the number of elements is equal to 0. false otherwise.

 - **keySet()**: *Set[T]* -- Returns a Set containing the keys of the Dict.

 - **keys()**: *Array[T]* -- Returns an Array containing the keys of the Dict.

 - **mapValues(expr: U => V)**: *Dict[T, V]*

	Returns a new Dict produced by applying ``expr`` to each value. The keys are unmodified.

	**Arguments**
	
		 - **expr** (*U => V*) -- Lambda expression.
	       

 - **size()**: *Int* -- Number of elements in the collection.

 - **values()**: *Array[U]* -- Returns an Array containing the values of the Dict.

.. _double:

------
Double
------


 - **abs()**: *Double* -- Returns the absolute value of a number.

 - **max(a: Double)**: *Double* -- 	Returns the maximum value.

 - **min(a: Double)**: *Double* -- 	Returns the minimum value.

 - **signum()**: *Int* -- Returns the sign of a number (1, 0, or -1).

 - **toDouble()**: *Double* -- Convert value to a Double.

 - **toFloat()**: *Float* -- Convert value to a Float.

 - **toInt()**: *Int* -- Convert value to an Integer.

 - **toLong()**: *Long* -- Convert value to a Long.

.. _float:

-----
Float
-----


 - **abs()**: *Float* -- Returns the absolute value of a number.

 - **max(a: Float)**: *Float* -- 	Returns the maximum value.

 - **min(a: Float)**: *Float* -- 	Returns the minimum value.

 - **signum()**: *Int* -- Returns the sign of a number (1, 0, or -1).

 - **toDouble()**: *Double* -- Convert value to a Double.

 - **toFloat()**: *Float* -- Convert value to a Float.

 - **toInt()**: *Int* -- Convert value to an Integer.

 - **toLong()**: *Long* -- Convert value to a Long.

.. _genotype:

--------
Genotype
--------


A ``Genotype`` is a Hail data type representing a genotype in the Variant Dataset. It is referred to as ``g`` in the expression language.

 - **ad**: *Array[Int]* -- allelic depth for each allele.

 - **call()**: *Call* -- the integer ``gt = k*(k+1)/2 + j`` for call ``j/k`` (0 = 0/0, 1 = 0/1, 2 = 1/1, 3 = 0/2, etc.).

 - **dosage**: *Double* -- the expected number of non-reference alleles based on genotype probabilities.

 - **dp**: *Int* -- the total number of informative reads.

 - **fakeRef**: *Boolean* -- True if this genotype was downcoded in :py:meth:`~hail.VariantDataset.split_multi`.  This can happen if a ``1/2`` call is split to ``0/1``, ``0/1``.

 - **fractionReadsRef()**: *Double* -- the ratio of ref reads to the sum of all *informative* reads.

 - **gp**: *Array[Double]* -- the linear-scaled probabilities.

 - **gq**: *Int* -- the difference between the two smallest PL entries.

 - **gt**: *Int* -- the integer ``gt = k*(k+1)/2 + j`` for call ``j/k`` (0 = 0/0, 1 = 0/1, 2 = 1/1, 3 = 0/2, etc.).

 - **gtj()**: *Int* -- the index of allele ``j`` for call ``j/k`` (0 = ref, 1 = first alt allele, etc.).

 - **gtk()**: *Int* -- the index of allele ``k`` for call ``j/k`` (0 = ref, 1 = first alt allele, etc.).

 - **isCalled()**: *Boolean* -- True if the genotype is not ``./.``.

 - **isCalledNonRef()**: *Boolean* -- True if either ``g.isHet`` or ``g.isHomVar`` is true.

 - **isHet()**: *Boolean* -- True if this call is heterozygous.

 - **isHetNonRef()**: *Boolean* -- True if this call is ``j/k`` with ``j>0``.

 - **isHetRef()**: *Boolean* -- True if this call is ``0/k`` with ``k>0``.

 - **isHomRef()**: *Boolean* -- True if this call is ``0/0``.

 - **isHomVar()**: *Boolean* -- True if this call is ``j/j`` with ``j>0``.

 - **isLinearScale**: *Boolean* -- True if the data was imported from :py:meth:`~hail.HailContext.import_gen` or :py:meth:`~hail.HailContext.import_bgen`.

 - **isNotCalled()**: *Boolean* -- True if the genotype is ``./.``.

 - **nNonRefAlleles()**: *Int* -- the number of called alternate alleles.

 - **od()**: *Int* -- ``od = dp - ad.sum``.

 - **oneHotAlleles(v: Variant)**: *Array[Int]*

	Produce an array of called counts for each allele in the variant (including reference). For example, calling this function with a biallelic variant on hom-ref, het, and hom-var genotypes will produce ``[2, 0]``, ``[1, 1]``, and ``[0, 2]`` respectively.

	**Arguments**
	
		 - **v** (*Variant*) -- :ref:`variant`
	       

 - **oneHotGenotype(v: Variant)**: *Array[Int]*

	Produces an array with one element for each possible genotype in the variant, where the called genotype is 1 and all else 0. For example, calling this function with a biallelic variant on hom-ref, het, and hom-var genotypes will produce ``[1, 0, 0]``, ``[0, 1, 0]``, and ``[0, 0, 1]`` respectively.

	**Arguments**
	
		 - **v** (*Variant*) -- :ref:`variant`
	       

 - **pAB()**: *Double* -- p-value for pulling the given allelic depth from a binomial distribution with mean 0.5.  Missing if the call is not heterozygous.

 - **pl**: *Array[Int]*

	phred-scaled normalized genotype likelihood values. The conversion between
	``g.pl`` (Phred-scaled likelihoods) and ``g.gp`` (linear-scaled
	probabilities) assumes a uniform prior.

.. _int:

---
Int
---


 - **abs()**: *Int* -- Returns the absolute value of a number.

 - **max(a: Int)**: *Int* -- 	Returns the maximum value.

 - **min(a: Int)**: *Int* -- 	Returns the minimum value.

 - **signum()**: *Int* -- Returns the sign of a number (1, 0, or -1).

 - **toDouble()**: *Double* -- Convert value to a Double.

 - **toFloat()**: *Float* -- Convert value to a Float.

 - **toInt()**: *Int* -- Convert value to an Integer.

 - **toLong()**: *Long* -- Convert value to a Long.

.. _interval:

--------
Interval
--------


An ``Interval`` is a Hail data type representing a range of genomic locations in the Variant Dataset.

 - **contains(locus: Locus)**: *Boolean*

	Returns true if the ``locus`` is in the interval.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let i = Interval(Locus("1", 1000), Locus("1", 2000)) in i.contains(Locus("1", 1500))
	    result: true

	**Arguments**
	
		 - **locus** (*Locus*) -- :ref:`locus`
	       

 - **end**: *Locus* -- :ref:`locus` at the end of the interval (exclusive).

 - **start**: *Locus* -- :ref:`locus` at the start of the interval (inclusive).

.. _locus:

-----
Locus
-----


A ``Locus`` is a Hail data type representing a specific genomic location in the Variant Dataset.

 - **contig**: *String* -- String representation of contig.

 - **position**: *Int* -- Chromosomal position.

.. _long:

----
Long
----


 - **abs()**: *Long* -- Returns the absolute value of a number.

 - **max(a: Long)**: *Long* -- 	Returns the maximum value.

 - **min(a: Long)**: *Long* -- 	Returns the minimum value.

 - **signum()**: *Int* -- Returns the sign of a number (1, 0, or -1).

 - **toDouble()**: *Double* -- Convert value to a Double.

 - **toFloat()**: *Float* -- Convert value to a Float.

 - **toInt()**: *Int* -- Convert value to an Integer.

 - **toLong()**: *Long* -- Convert value to a Long.

.. _set:

---
Set
---


A ``Set`` is an unordered collection with no repeated values of a given data type (ex: Int, String). Sets can be constructed by specifying ``[item1, item2, ...].toSet()``.

.. code-block:: text
    :emphasize-lines: 2

    let s = ["rabbit", "cat", "dog", "dog"].toSet()
    result: Set("cat", "dog", "rabbit")

They can also be nested such as Set[Set[Int]]:

.. code-block:: text
    :emphasize-lines: 2

    let s = [[1, 2, 3].toSet(), [4, 5, 5].toSet()].toSet()
    result: Set(Set(1, 2, 3), Set(4, 5))

.. _set[double]:

Set[Double]
===========

 - **max()**: *Double* -- Largest element in the collection.

 - **mean()**: *Double* -- Mean value of the collection.

 - **median()**: *Double* -- Median value of the collection.

 - **min()**: *Double* -- Smallest element in the collection.

 - **product()**: *Double* -- Product of all elements in the collection (returns 1 if empty).

 - **sum()**: *Double* -- Sum of all elements in the collection.

.. _set[float]:

Set[Float]
==========

 - **max()**: *Float* -- Largest element in the collection.

 - **mean()**: *Double* -- Mean value of the collection.

 - **median()**: *Float* -- Median value of the collection.

 - **min()**: *Float* -- Smallest element in the collection.

 - **product()**: *Float* -- Product of all elements in the collection (returns 1 if empty).

 - **sum()**: *Float* -- Sum of all elements in the collection.

.. _set[int]:

Set[Int]
========

 - **max()**: *Int* -- Largest element in the collection.

 - **mean()**: *Double* -- Mean value of the collection.

 - **median()**: *Int* -- Median value of the collection.

 - **min()**: *Int* -- Smallest element in the collection.

 - **product()**: *Int* -- Product of all elements in the collection (returns 1 if empty).

 - **sum()**: *Int* -- Sum of all elements in the collection.

.. _set[long]:

Set[Long]
=========

 - **max()**: *Long* -- Largest element in the collection.

 - **mean()**: *Double* -- Mean value of the collection.

 - **median()**: *Long* -- Median value of the collection.

 - **min()**: *Long* -- Smallest element in the collection.

 - **product()**: *Long* -- Product of all elements in the collection (returns 1 if empty).

 - **sum()**: *Long* -- Sum of all elements in the collection.

.. _set[set[t]]:

Set[Set[T]]
===========

 - **flatten()**: *Set[T]*

	Flattens a nested set by concatenating all its elements into a single set.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [[1, 2].toSet(), [3, 4].toSet()].toSet() in s.flatten()
	    result: Set(1, 2, 3, 4)

.. _set[string]:

Set[String]
===========

 - **mkString(delimiter: String)**: *String*

	Concatenates all elements of this set into a single string where each element is separated by the ``delimiter``.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [1, 2, 3].toSet() in s.mkString(",")
	    result: "1,2,3"

	**Arguments**
	
		 - **delimiter** (*String*) -- String that separates each element.
	       

.. _set[t]:

Set[T]
======

 - **add(a: T)**: *Set[T]* -- 	Returns the result of adding the element `a` to this Set.

 - **contains(x: T)**: *Boolean*

	Returns true if the element ``x`` is contained in the set, otherwise false.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [1, 2, 3].toSet() in s.contains(5)
	    result: false

	**Arguments**
	
		 - **x** (*T*) -- Value to test.
	       

 - **difference(a: Set[T])**: *Set[T]* -- 	Returns the elements of this Set that are not in Set `a`.

 - **exists(expr: T => Boolean)**: *Boolean*

	Returns a boolean which is true if **any** element in the set satisfies the condition given by ``expr`` and false otherwise.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [0, 2, 4, 6, 8, 10].toSet() in s.exists(e => e % 2 == 1)
	    result: false

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **filter(expr: T => Boolean)**: *Set[T]*

	Returns a new set subsetted to the elements where ``expr`` evaluates to true.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [1, 4, 5, 6, 10].toSet() in s.filter(e => e >= 5)
	    result: Set(5, 6, 10)

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **find(expr: T => Boolean)**: *T*

	Returns the first non-missing element of the array for which expr is true. If no element satisfies the predicate, find returns NA.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [1, 2, 3].toSet() in s.find(e => e % 3 == 0)
	    result: 3

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **flatMap(expr: T => Set[U])**: *Set[U]*

	Returns a new set by applying a function to each subset and concatenating the resulting sets.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [["a", "b", "c"].toSet(), ["d", "e"].toSet(), ["f"].toSet()].toSet() in s.flatMap(e => e + "1")
	    result: Set("a1", "b1", "c1", "d1", "e1", "f1")

	**Arguments**
	
		 - **expr** (*T => Set[U]*) -- Lambda expression.
	       

 - **forall(expr: T => Boolean)**: *Boolean*

	Returns a boolean which is true if **all** elements in the set satisfies the condition given by ``expr`` and false otherwise.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [0.1, 0.5, 0.3, 1.0, 2.5, 3.0].toSet() in s.forall(e => e > 1.0 == 0)
	    result: false

	**Arguments**
	
		 - **expr** (*T => Boolean*) -- Lambda expression.
	       

 - **groupBy(a: T => U)**: *Dict[U, Set[T]]*

 - **head()**: *T* -- Select one element.

 - **intersection(a: Set[T])**: *Set[T]* -- 	Returns the intersection of this Set and Set `a`.

 - **isEmpty()**: *Boolean* -- Returns true if the number of elements is equal to 0. false otherwise.

 - **issubset(a: Set[T])**: *Boolean* -- 	Returns true if this Set is a subset of Set `a`.

 - **map(expr: T => U)**: *Set[U]*

	Returns a new set produced by applying ``expr`` to each element.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = [1, 2, 3].toSet() in s.map(e => e * 3)
	    result: Set(3, 6, 9)

	**Arguments**
	
		 - **expr** (*T => U*) -- Lambda expression.
	       

 - **remove(a: T)**: *Set[T]* -- 	Returns the result of removing the element `a` from this Set.

 - **size()**: *Int* -- Number of elements in the collection.

 - **tail()**: *Set[T]* -- Select all elements except the element returned by ``head``.

 - **toArray()**: *Array[T]* -- Convert collection to an Array.

 - **toSet()**: *Set[T]* -- Convert collection to a Set.

 - **union(a: Set[T])**: *Set[T]* -- 	Returns the union of this Set and Set `a`.

.. _string:

------
String
------


 - **[:\j]**: *String*

	Returns a slice of the string from the first unicode-codepoint until the
	j*th* unicode-codepoint (0-indexed). Negative indices are interpreted as
	offsets from the end of the string.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = "abcdef" in a[:4]
	    result: "abcd"
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = "abcdef" in a[:-3]
	    result: "abc"

	**Arguments**
	
		 - **j** (*Int*) -- End index of the slice (not included in result).
	       

 - **[:]**: *String*

	Returns a copy of the string.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = "abcd" in a[:]
	    result: "abcd"

 - **[\i:\j]**: *String*

	Returns a slice of the string from the i*th* unicode-codepoint until the
	j*th* unicode-codepoint (both 0-indexed). Negative indices are interpreted
	as offsets from the end of the string instead of the beginning.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = "abcdef" in a[2:4]
	    result: "cd"
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = "abcdef" in a[-3:-1]
	    result: "de"
	
	A handy way to understand the behavior of negative indicies is to recall
	this rule, which holds for any positive integers i and j:
	
	.. code-block:: text
	
	    s[-i:-j] == s[s.length - i, s.length - j]

	**Arguments**
	
		 - **i** (*Int*) -- Starting index of the slice.
		 - **j** (*Int*) -- End index of the slice (not included in result).
	       

 - **[\i:]**: *String*

	Returns a slice of the string from the i*th* unicode-codepoint (0-indexed)
	to the end. Negative indices are interpreted as offsets from the end of the
	string.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = "abcdef" in a[3:]
	    result: "def"
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let a = "abcdef" in a[-5:]
	    result: "bcdef"

	**Arguments**
	
		 - **i** (*Int*) -- Starting index of the slice.
	       

 - **[i]**: *String*

	Returns the i*th* element (0-indexed) of the string, or throws an exception if ``i`` is an invalid index.
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = "genetics" in s[6]
	    result: "c"

	**Arguments**
	
		 - **i** (*Int*) -- Index of the character to return.
	       

 - **entropy()**: *Double*

	Computes the `Shannon entropy <https://en.wikipedia.org/wiki/Entropy#Information_theory>`__ in bits of the
	character frequency distribution.

 - **length()**: *Int* -- Length of the string.

 - **max(a: String)**: *String* -- 	Returns the maximum value.

 - **min(a: String)**: *String* -- 	Returns the minimum value.

 - **replace(pattern1: String, pattern2: String)**: *String*

	Replaces each substring of this string that matches the given `regular expression <https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html>`_ (``pattern1``) with the given replacement (``pattern2``).
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = "1kg-NA12878" in a.replace("1kg-", "")
	    result: "NA12878"

	**Arguments**
	
		 - **pattern1** (*String*) -- Substring to replace.
		 - **pattern2** (*String*) -- Replacement string.
	       

 - **split(delim: String, n: Int)**: *Array[String]*

	Returns an array of strings, split on the given regular expression delimiter with the pattern applied ``n`` times. See the documentation on `regular expression syntax <https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html>`_ delimiter. If you need to split on special characters, escape them with double backslash (\\\\).
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = "1kg-NA12878" in s.split("-")
	    result: ["1kg", "NA12878"]

	**Arguments**
	
		 - **delim** (*String*) -- Regular expression delimiter.
		 - **n** (*Int*) -- Number of times the pattern is applied. See the `Java documentation <https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#split-java.lang.String-int->`_ for more information.
	       

 - **split(delim: String)**: *Array[String]*

	Returns an array of strings, split on the given regular expression delimiter. See the documentation on `regular expression syntax <https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html>`_ delimiter. If you need to split on special characters, escape them with double backslash (\\\\).
	
	.. code-block:: text
	    :emphasize-lines: 2
	
	    let s = "1kg-NA12878" in s.split("-")
	    result: ["1kg", "NA12878"]

	**Arguments**
	
		 - **delim** (*String*) -- Regular expression delimiter.
	       

 - **toDouble()**: *Double* -- Convert value to a Double.

 - **toFloat()**: *Float* -- Convert value to a Float.

 - **toInt()**: *Int* -- Convert value to an Integer.

 - **toLong()**: *Long* -- Convert value to a Long.

.. _struct:

------
Struct
------


A ``Struct`` is like a Python tuple where the fields are named and the set of fields is fixed.

An example of constructing and accessing the fields in a ``Struct`` is

.. code-block:: text
    :emphasize-lines: 2

    let s = {gene: "ACBD", function: "LOF", nHet: 12} in s.gene
    result: "ACBD"

A field of the ``Struct`` can also be another ``Struct``. For example, ``va.info.AC`` selects the struct ``info`` from the struct ``va``, and then selects the array ``AC`` from the struct ``info``.

.. _variant:

-------
Variant
-------


A ``Variant`` is a Hail data type representing a variant in the Variant Dataset. It is referred to as ``v`` in the expression language.

The `pseudoautosomal region <https://en.wikipedia.org/wiki/Pseudoautosomal_region>`_ (PAR) is currently defined with respect to reference `GRCh37 <http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/human/>`_:

- X: 60001 - 2699520, 154931044 - 155260560
- Y: 10001 - 2649520, 59034050 - 59363566

Most callers assign variants in PAR to X.

 - **alt()**: *String* -- Alternate allele sequence.  **Assumes biallelic.**

 - **altAllele()**: *AltAllele* -- The :ref:`alternate allele <altallele>`.  **Assumes biallelic.**

 - **altAlleles**: *Array[AltAllele]* -- The :ref:`alternate alleles <altallele>`.

 - **contig**: *String* -- String representation of contig, exactly as imported. *NB: Hail stores contigs as strings. Use double-quotes when checking contig equality.*

 - **inXNonPar()**: *Boolean* -- True if chromosome is X and start is not in pseudoautosomal region of X.

 - **inXPar()**: *Boolean* -- True if chromosome is X and start is in pseudoautosomal region of X.

 - **inYNonPar()**: *Boolean* -- True if chromosome is Y and start is not in pseudoautosomal region of Y.

 - **inYPar()**: *Boolean* -- True if chromosome is Y and start is in pseudoautosomal region of Y. *NB: most callers assign variants in PAR to X.*

 - **isAutosomal()**: *Boolean* -- True if chromosome is not X, not Y, and not MT.

 - **isBiallelic()**: *Boolean* -- True if `v` has one alternate allele.

 - **locus()**: *Locus* -- Chromosomal locus (chr, pos) of this variant

 - **nAlleles()**: *Int* -- Number of alleles.

 - **nAltAlleles()**: *Int* -- Number of alternate alleles, equal to ``nAlleles - 1``.

 - **nGenotypes()**: *Int* -- Number of genotypes.

 - **ref**: *String* -- Reference allele sequence.

 - **start**: *Int* -- SNP position or start of an indel.