Plot

Warning

Plotting functionality is in early stages and is experimental. Interfaces will change regularly.

Plotting in Hail is easy. Hail’s plot functions utilize Bokeh plotting libraries to create attractive, interactive figures. Plotting functions in this module return a Bokeh Figure, so you can call a method to plot your data and then choose to extend the plot however you like by interacting directly with Bokeh. See the GWAS tutorial for examples.

Plot functions in Hail accept data in the form of either Python objects or Table and MatrixTable fields.

`cdf`	Create a cumulative density plot.
`pdf`
`smoothed_pdf`	Create a density plot.
`histogram`	Create a histogram.
`cumulative_histogram`	Create a cumulative histogram.
`histogram2d`	Plot a two-dimensional histogram.
`scatter`	Create an interactive scatter plot.
`qq`	Create a Quantile-Quantile plot.
`manhattan`	Create a Manhattan plot.
`output_notebook`	Configure the Bokeh output state to generate output in notebook cells when `bokeh.io.show()` is called.
`visualize_missingness`	Visualize missingness in a MatrixTable.

hail.plot.cdf(data, k=350, legend=None, title=None, normalize=True, log=False)[source]

Create a cumulative density plot.

Parameters:

data (Struct or Float64Expression) – Sequence of data to plot.
k (int) – Accuracy parameter (passed to approx_cdf()).
legend (str) – Label of data on the x-axis.
title (str) – Title of the histogram.
normalize (bool) – Whether or not the cumulative data should be normalized.
log (bool) – Whether or not the y-axis should be of type log.

Returns:

bokeh.plotting.figure

hail.plot.pdf(data, k=1000, confidence=5, legend=None, title=None, log=False, interactive=False)[source]

hail.plot.smoothed_pdf(data, k=350, smoothing=0.5, legend=None, title=None, log=False, interactive=False, figure=None)[source]

Create a density plot.

Parameters:

data (Struct or Float64Expression) – Sequence of data to plot.
k (int) – Accuracy parameter.
smoothing (float) – Degree of smoothing.
legend (str) – Label of data on the x-axis.
title (str) – Title of the histogram.
log (bool) – Plot the log10 of the bin counts.
interactive (bool) – If True, return a handle to pass to bokeh.io.show().
figure (bokeh.plotting.figure) – If not None, add density plot to figure. Otherwise, create a new figure.

Returns:

bokeh.plotting.figure

hail.plot.histogram(data, range=None, bins=50, legend=None, title=None, log=False, interactive=False)[source]

Create a histogram.

Notes

data can be a Float64Expression, or the result of the hist() or approx_cdf() aggregators.

Parameters:

data (Struct or Float64Expression) – Sequence of data to plot.
range (Tuple[float]) – Range of x values in the histogram.
bins (int) – Number of bins in the histogram.
legend (str) – Label of data on the x-axis.
title (str) – Title of the histogram.
log (bool) – Plot the log10 of the bin counts.

Returns:

bokeh.plotting.figure

hail.plot.cumulative_histogram(data, range=None, bins=50, legend=None, title=None, normalize=True, log=False)[source]

Create a cumulative histogram.

Parameters:

data (Struct or Float64Expression) – Sequence of data to plot.
range (Tuple[float]) – Range of x values in the histogram.
bins (int) – Number of bins in the histogram.
legend (str) – Label of data on the x-axis.
title (str) – Title of the histogram.
normalize (bool) – Whether or not the cumulative data should be normalized.
log (bool) – Whether or not the y-axis should be of type log.

Returns:

bokeh.plotting.figure

hail.plot.histogram2d(x, y, bins=40, range=None, title=None, width=600, height=600, colors=('#eff3ff', '#c6dbef', '#9ecae1', '#6baed6', '#4292c6', '#2171b5', '#084594'), log=False)[source]

Plot a two-dimensional histogram.

x and y must both be a NumericExpression from the same Table.

If x_range or y_range are not provided, the function will do a pass through the data to determine min and max of each variable.

Examples

>>> ht = hail.utils.range_table(1000).annotate(x=hail.rand_norm(), y=hail.rand_norm())
>>> p_hist = hail.plot.histogram2d(ht.x, ht.y)

>>> ht = hail.utils.range_table(1000).annotate(x=hail.rand_norm(), y=hail.rand_norm())
>>> p_hist = hail.plot.histogram2d(ht.x, ht.y, bins=10, range=((0, 1), None))

Parameters:

x (NumericExpression) – Expression for x-axis (from a Hail table).
y (NumericExpression) – Expression for y-axis (from the same Hail table as x).
bins (int or [int, int]) – The bin specification: - If int, the number of bins for the two dimensions (nx = ny = bins). - If [int, int], the number of bins in each dimension (nx, ny = bins). The default value is 40.
range (None or ((float, float), (float, float))) – The leftmost and rightmost edges of the bins along each dimension: ((xmin, xmax), (ymin, ymax)). All values outside of this range will be considered outliers and not tallied in the histogram. If this value is None, or either of the inner lists is None, the range will be computed from the data.
width (int) – Plot width (default 600px).
height (int) – Plot height (default 600px).
title (str) – Title of the plot.
colors (Sequence[str]) – List of colors (hex codes, or strings as described here). Compatible with one of the many built-in palettes available here.
log (bool) – Plot the log10 of the bin counts.

Returns:

bokeh.plotting.figure

hail.plot.scatter(x, y, label=None, title=None, xlabel=None, ylabel=None, size=4, legend=True, hover_fields=None, colors=None, width=800, height=800, collect_all=None, n_divisions=500, missing_label='NA')[source]

Create an interactive scatter plot.

x and y must both be either: - a NumericExpression from the same Table. - a tuple (str, NumericExpression) from the same Table. If passed as a tuple the first element is used as the hover label.

If no label or a single label is provided, then returns bokeh.plotting.figure Otherwise returns a bokeh.models.layouts.Column containing: - a bokeh.models.widgets.inputs.Select dropdown selection widget for labels - a bokeh.plotting.figure containing the interactive scatter plot

Points will be colored by one of the labels defined in the label using the color scheme defined in the corresponding entry of colors if provided (otherwise a default scheme is used). To specify your color mapper, check the bokeh documentation for CategoricalMapper for categorical labels, and for LinearColorMapper and LogColorMapper for continuous labels. For categorical labels, clicking on one of the items in the legend will hide/show all points with the corresponding label. Note that using many different labelling schemes in the same plots, particularly if those labels contain many different classes could slow down the plot interactions.

Hovering on points will display their coordinates, labels and any additional fields specified in hover_fields.

Parameters:

x (NumericExpression or (str, NumericExpression)) – List of x-values to be plotted.
y (NumericExpression or (str, NumericExpression)) – List of y-values to be plotted.
label (Expression or Dict[str, Expression]], optional) – Either a single expression (if a single label is desired), or a dictionary of label name -> label value for x and y values. Used to color each point w.r.t its label. When multiple labels are given, a dropdown will be displayed with the different options. Can be used with categorical or continuous expressions.
title (str, optional) – Title of the scatterplot.
xlabel (str, optional) – X-axis label.
ylabel (str, optional) – Y-axis label.
size (int) – Size of markers in screen space units.
legend (bool) – Whether or not to show the legend in the resulting figure.
hover_fields (Dict[str, Expression], optional) – Extra fields to be displayed when hovering over a point on the plot.
colors (bokeh.models.mappers.ColorMapper or Dict[str, bokeh.models.mappers.ColorMapper], optional) – If a single label is used, then this can be a color mapper, if multiple labels are used, then this should be a Dict of label name -> color mapper. Used to set colors for the labels defined using label. If not used at all, or label names not appearing in this dict will be colored using a default color scheme.
width (int) – Plot width
height (int) – Plot height
collect_all (bool, optional) – Deprecated. Use n_divisions instead.
n_divisions (int, optional) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints. Use None to collect all points.
missing_label (str) – Label to use when a point is missing data for a categorical label

Returns:

bokeh.models.Plot if no label or a single label was given, otherwise bokeh.models.layouts.Column

hail.plot.qq(pvals, label=None, title='Q-Q plot', xlabel='Expected -log10(p)', ylabel='Observed -log10(p)', size=6, legend=True, hover_fields=None, colors=None, width=800, height=800, collect_all=None, n_divisions=500, missing_label='NA')[source]

Create a Quantile-Quantile plot. (https://en.wikipedia.org/wiki/Q-Q_plot)

If no label or a single label is provided, then returns bokeh.plotting.figure Otherwise returns a bokeh.models.layouts.Column containing: - a bokeh.models.widgets.inputs.Select dropdown selection widget for labels - a bokeh.plotting.figure containing the interactive qq plot

Points will be colored by one of the labels defined in the label using the color scheme defined in the corresponding entry of colors if provided (otherwise a default scheme is used). To specify your color mapper, check the bokeh documentation for CategoricalMapper for categorical labels, and for LinearColorMapper and LogColorMapper for continuous labels. For categorical labels, clicking on one of the items in the legend will hide/show all points with the corresponding label. Note that using many different labelling schemes in the same plots, particularly if those labels contain many different classes could slow down the plot interactions.

Hovering on points will display their coordinates, labels and any additional fields specified in hover_fields.

Parameters:

pvals (NumericExpression) – List of x-values to be plotted.
label (Expression or Dict[str, Expression]]) – Either a single expression (if a single label is desired), or a dictionary of label name -> label value for x and y values. Used to color each point w.r.t its label. When multiple labels are given, a dropdown will be displayed with the different options. Can be used with categorical or continuous expressions.
title (str, optional) – Title of the scatterplot.
xlabel (str, optional) – X-axis label.
ylabel (str, optional) – Y-axis label.
size (int) – Size of markers in screen space units.
legend (bool) – Whether or not to show the legend in the resulting figure.
hover_fields (Dict[str, Expression], optional) – Extra fields to be displayed when hovering over a point on the plot.
colors (bokeh.models.mappers.ColorMapper or Dict[str, bokeh.models.mappers.ColorMapper], optional) – If a single label is used, then this can be a color mapper, if multiple labels are used, then this should be a Dict of label name -> color mapper. Used to set colors for the labels defined using label. If not used at all, or label names not appearing in this dict will be colored using a default color scheme.
width (int) – Plot width
height (int) – Plot height
collect_all (bool) – Deprecated. Use n_divisions instead.
n_divisions (int, optional) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints. Use None to collect all points.
missing_label (str) – Label to use when a point is missing data for a categorical label

Returns:

bokeh.plotting.figure if no label or a single label was given, otherwise bokeh.models.layouts.Column

hail.plot.manhattan(pvals, locus=None, title=None, size=4, hover_fields=None, collect_all=None, n_divisions=500, significance_line=5e-08)[source]

Create a Manhattan plot. (https://en.wikipedia.org/wiki/Manhattan_plot)

Parameters:

pvals (Float64Expression) – P-values to be plotted.
locus (LocusExpression, optional) – Locus values to be plotted.
title (str, optional) – Title of the plot.
size (int) – Size of markers in screen space units.
hover_fields (Dict[str, Expression], optional) – Dictionary of field names and values to be shown in the HoverTool of the plot.
collect_all (bool, optional) – Deprecated - use n_divisions instead.
n_divisions (int, optional.) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints. Use None to collect all points.
significance_line (float, optional) – p-value at which to add a horizontal, dotted red line indicating genome-wide significance. If None, no line is added.

Returns:

bokeh.models.Plot

hail.plot.output_notebook()[source]: Configure the Bokeh output state to generate output in notebook cells when bokeh.io.show() is called. Calls bokeh.io.output_notebook().

hail.plot.visualize_missingness(entry_field, row_field=None, column_field=None, window=6000000, plot_width=1800, plot_height=900)[source]

Visualize missingness in a MatrixTable.

Inspired by naniar.

Row field is windowed by default, and missingness is aggregated over this window to generate a proportion defined. This windowing is set to 6,000,000 by default, so that the human genome is divided into ~500 rows. With ~2,000 columns, this function returns a sensibly-sized plot with this windowing.

Warning

Generating a plot with more than ~1M points takes a long time for Bokeh to render. Consider windowing carefully.

Parameters:

entry_field (Expression) – Field for which to check missingness.
row_field (NumericExpression or LocusExpression) – Row field to use for y-axis (can be windowed). If not provided, the row key will be used.
column_field (StringExpression) – Column field to use for x-axis. If not provided, the column key will be used.
window (int, optional) – Size of window to summarize by row_field. If set to None, each field will be used individually.
plot_width (int) – Plot width in px.
plot_height (int) – Plot height in px.

Returns:

bokeh.plotting.figure