Plot

Warning

Plotting functionality is in early stages and is experimental. Interfaces will change regularly.

Plotting in Hail is easy. Hail’s plot functions utilize Bokeh plotting libraries to create attractive, interactive figures. Plotting functions in this module return a Bokeh Figure, so you can call a method to plot your data and then choose to extend the plot however you like by interacting directly with Bokeh. See the GWAS tutorial for examples.

Plot functions in Hail accept data in the form of either Python objects or Table and MatrixTable fields.

histogram Create a histogram.
cumulative_histogram Create a cumulative histogram.
histogram2d Plot a two-dimensional histogram.
scatter Create an interactive scatter plot.
qq Create a Quantile-Quantile plot.
manhattan Create a Manhattan plot.
hail.plot.histogram(data, range=None, bins=50, legend=None, title=None, log=False, interactive=False)[source]

Create a histogram.

Notes

data can be a Float64Expression, or the result of the agg.hist() or agg.approx_cdf() aggregators.

Parameters:
  • data (Struct or Float64Expression) – Sequence of data to plot.
  • range (Tuple[float]) – Range of x values in the histogram.
  • bins (int) – Number of bins in the histogram.
  • legend (str) – Label of data on the x-axis.
  • title (str) – Title of the histogram.
  • log (bool) – Plot the log10 of the bin counts.
Returns:

bokeh.plotting.figure.Figure

hail.plot.cumulative_histogram(data, range=None, bins=50, legend=None, title=None, normalize=True, log=False)[source]

Create a cumulative histogram.

Parameters:
  • data (Struct or Float64Expression) – Sequence of data to plot.
  • range (Tuple[float]) – Range of x values in the histogram.
  • bins (int) – Number of bins in the histogram.
  • legend (str) – Label of data on the x-axis.
  • title (str) – Title of the histogram.
  • normalize (bool) – Whether or not the cumulative data should be normalized.
  • log (bool) – Whether or not the y-axis should be of type log.
Returns:

bokeh.plotting.figure.Figure

hail.plot.histogram2d(x, y, bins=40, range=None, title=None, width=600, height=600, font_size='7pt', colors=['#eff3ff', '#c6dbef', '#9ecae1', '#6baed6', '#4292c6', '#2171b5', '#084594'])[source]

Plot a two-dimensional histogram.

x and y must both be a NumericExpression from the same Table.

If x_range or y_range are not provided, the function will do a pass through the data to determine min and max of each variable.

Examples

>>> ht = hail.utils.range_table(1000).annotate(x=hail.rand_norm(), y=hail.rand_norm())
>>> p_hist = hail.plot.histogram2d(ht.x, ht.y)
>>> ht = hail.utils.range_table(1000).annotate(x=hail.rand_norm(), y=hail.rand_norm())
>>> p_hist = hail.plot.histogram2d(ht.x, ht.y, bins=10, range=((0, 1), None))
Parameters:
  • x (NumericExpression) – Expression for x-axis (from a Hail table).
  • y (NumericExpression) – Expression for y-axis (from the same Hail table as x).
  • bins (int or [int, int]) – The bin specification: - If int, the number of bins for the two dimensions (nx = ny = bins). - If [int, int], the number of bins in each dimension (nx, ny = bins). The default value is 40.
  • range (None or ((float, float), (float, float))) – The leftmost and rightmost edges of the bins along each dimension: ((xmin, xmax), (ymin, ymax)). All values outside of this range will be considered outliers and not tallied in the histogram. If this value is None, or either of the inner lists is None, the range will be computed from the data.
  • width (int) – Plot width (default 600px).
  • height (int) – Plot height (default 600px).
  • title (str) – Title of the plot.
  • font_size (str) – String of font size in points (default ‘7pt’).
  • colors (List[str]) – List of colors (hex codes, or strings as described here). Compatible with one of the many built-in palettes available here.
Returns:

bokeh.plotting.figure.Figure

hail.plot.scatter(x: Union[hail.expr.expressions.typed_expressions.NumericExpression, Tuple[str, hail.expr.expressions.typed_expressions.NumericExpression]], y: Union[hail.expr.expressions.typed_expressions.NumericExpression, Tuple[str, hail.expr.expressions.typed_expressions.NumericExpression]], label: Union[hail.expr.expressions.base_expression.Expression, Dict[str, hail.expr.expressions.base_expression.Expression]] = None, title: str = None, xlabel: str = None, ylabel: str = None, size: int = 4, legend: bool = True, hover_fields: Dict[str, hail.expr.expressions.base_expression.Expression] = None, colors: Union[bokeh.models.mappers.ColorMapper, Dict[str, bokeh.models.mappers.ColorMapper]] = None, width: int = 800, height: int = 800, collect_all: bool = False, n_divisions: int = 500, missing_label: str = 'NA') → Union[bokeh.plotting.figure.Figure, bokeh.models.layouts.Column][source]

Create an interactive scatter plot.

x and y must both be either: - a NumericExpression from the same Table. - a tuple (str, NumericExpression) from the same Table. If passed as a tuple the first element is used as the hover label.

If no label or a single label is provided, then returns bokeh.plotting.figure.Figure Otherwise returns a bokeh.plotting.figure.Column containing: - a bokeh.models.widgets.Select dropdown selection widget for labels - a bokeh.plotting.figure.Figure containing the interactive scatter plot

Points will be colored by one of the labels defined in the label using the color scheme defined in the corresponding entry of colors if provided (otherwise a default scheme is used). To specify your color mapper, check the bokeh documentation for CategoricalMapper for categorical labels, and for LinearColorMapper and LogColorMapper for continuous labels. For categorical labels, clicking on one of the items in the legend will hide/show all points with the corresponding label. Note that using many different labelling schemes in the same plots, particularly if those labels contain many different classes could slow down the plot interactions.

Hovering on points will display their coordinates, labels and any additional fields specified in source_fields.

Parameters:
  • x (NumericExpression or (str, NumericExpression)) – List of x-values to be plotted.
  • y (NumericExpression or (str, NumericExpression)) – List of y-values to be plotted.
  • label (Expression or Dict[str, Expression]]) – Either a single expression (if a single label is desired), or a dictionary of label name -> label value for x and y values. Used to color each point w.r.t its label. When multiple labels are given, a dropdown will be displayed with the different options. Can be used with categorical or continuous expressions.
  • title (str) – Title of the scatterplot.
  • xlabel (str) – X-axis label.
  • ylabel (str) – Y-axis label.
  • size (int) – Size of markers in screen space units.
  • legend (bool) – Whether or not to show the legend in the resulting figure.
  • hover_fields (Dict[str, Expression]) – Extra fields to be displayed when hovering over a point on the plot.
  • colors (bokeh.models.mappers.ColorMapper or Dict[str, bokeh.models.mappers.ColorMapper]) – If a single label is used, then this can be a color mapper, if multiple labels are used, then this should be a Dict of label name -> color mapper. Used to set colors for the labels defined using label. If not used at all, or label names not appearing in this dict will be colored using a default color scheme.
  • width (int) – Plot width
  • height (int) – Plot height
  • collect_all (bool) – Whether to collect all values or downsample before plotting.
  • n_divisions (int) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints.
  • missing_label (str) – Label to use when a point is missing data for a categorical label
Returns:

bokeh.plotting.figure.Figure if no label or a single label was given, otherwise bokeh.plotting.figure.Column

hail.plot.qq(pvals, collect_all=False, n_divisions=500)[source]

Create a Quantile-Quantile plot. (https://en.wikipedia.org/wiki/Q-Q_plot)

Parameters:
  • pvals (List[float] or Float64Expression) – P-values to be plotted.
  • collect_all (bool) – Whether to collect all values or downsample before plotting. This parameter will be ignored if pvals is a Python object.
  • n_divisions (int) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints.
Returns:

bokeh.plotting.figure.Figure

hail.plot.manhattan(pvals, locus=None, title=None, size=4, hover_fields=None, collect_all=False, n_divisions=500, significance_line=5e-08)[source]

Create a Manhattan plot. (https://en.wikipedia.org/wiki/Manhattan_plot)

Parameters:
  • pvals (Float64Expression) – P-values to be plotted.
  • locus (LocusExpression) – Locus values to be plotted.
  • title (str) – Title of the plot.
  • size (int) – Size of markers in screen space units.
  • hover_fields (Dict[str, Expression]) – Dictionary of field names and values to be shown in the HoverTool of the plot.
  • collect_all (bool) – Whether to collect all values or downsample before plotting.
  • n_divisions (int) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints.
  • significance_line (float, optional) – p-value at which to add a horizontal, dotted red line indicating genome-wide significance. If None, no line is added.
Returns:

bokeh.plotting.figure.Figure