Plot

Warning

Plotting functionality is in early stages and is experimental. Interfaces will change regularly.

Plotting in Hail is easy. Hail’s plot functions utilize Bokeh plotting libraries to create attractive, interactive figures. Plotting functions in this module return a Bokeh Figure, so you can call a method to plot your data and then choose to extend the plot however you like by interacting directly with Bokeh. See the GWAS tutorial for examples.

Plot functions in Hail accept data in the form of either Python objects or Table and MatrixTable fields.

histogram Create a histogram.
cumulative_histogram Create a cumulative histogram.
histogram2d Plot a two-dimensional histogram.
scatter Create a scatterplot.
qq Create a Quantile-Quantile plot.
manhattan Create a Manhattan plot.
hail.plot.histogram(data, range=None, bins=50, legend=None, title=None, log=False)[source]

Create a histogram.

Parameters:
  • data (Struct or Float64Expression) – Sequence of data to plot.
  • range (Tuple[float]) – Range of x values in the histogram.
  • bins (int) – Number of bins in the histogram.
  • legend (str) – Label of data on the x-axis.
  • title (str) – Title of the histogram.
  • log (bool) – Plot the log10 of the bin counts.
Returns:

bokeh.plotting.figure.Figure

hail.plot.cumulative_histogram(data, range=None, bins=50, legend=None, title=None, normalize=True, log=False)[source]

Create a cumulative histogram.

Parameters:
  • data (Struct or Float64Expression) – Sequence of data to plot.
  • range (Tuple[float]) – Range of x values in the histogram.
  • bins (int) – Number of bins in the histogram.
  • legend (str) – Label of data on the x-axis.
  • title (str) – Title of the histogram.
  • normalize (bool) – Whether or not the cumulative data should be normalized.
  • log (bool) – Whether or not the y-axis should be of type log.
Returns:

bokeh.plotting.figure.Figure

hail.plot.histogram2d(x, y, bins=40, range=None, title=None, width=600, height=600, font_size='7pt', colors=['#eff3ff', '#c6dbef', '#9ecae1', '#6baed6', '#4292c6', '#2171b5', '#084594'])[source]

Plot a two-dimensional histogram.

x and y must both be a NumericExpression from the same Table.

If x_range or y_range are not provided, the function will do a pass through the data to determine min and max of each variable.

Examples

>>> ht = hail.utils.range_table(1000).annotate(x=hail.rand_norm(), y=hail.rand_norm())
>>> p_hist = hail.plot.histogram2d(ht.x, ht.y)
>>> ht = hail.utils.range_table(1000).annotate(x=hail.rand_norm(), y=hail.rand_norm())
>>> p_hist = hail.plot.histogram2d(ht.x, ht.y, bins=10, range=((0, 1), None))
Parameters:
  • x (NumericExpression) – Expression for x-axis (from a Hail table).
  • y (NumericExpression) – Expression for y-axis (from the same Hail table as x).
  • bins (int or [int, int]) – The bin specification: - If int, the number of bins for the two dimensions (nx = ny = bins). - If [int, int], the number of bins in each dimension (nx, ny = bins). The default value is 40.
  • range (None or ((float, float), (float, float))) – The leftmost and rightmost edges of the bins along each dimension: ((xmin, xmax), (ymin, ymax)). All values outside of this range will be considered outliers and not tallied in the histogram. If this value is None, or either of the inner lists is None, the range will be computed from the data.
  • width (int) – Plot width (default 600px).
  • height (int) – Plot height (default 600px).
  • title (str) – Title of the plot.
  • font_size (str) – String of font size in points (default ‘7pt’).
  • colors (List[str]) – List of colors (hex codes, or strings as described here). Compatible with one of the many built-in palettes available here.
Returns:

bokeh.plotting.figure.Figure

hail.plot.scatter(x, y, label=None, title=None, xlabel=None, ylabel=None, size=4, legend=True, collect_all=False, n_divisions=500, source_fields=None)[source]

Create a scatterplot.

Parameters:
  • x (List[float] or Float64Expression) – List of x-values to be plotted.
  • y (List[float] or Float64Expression) – List of y-values to be plotted.
  • label (List[str] or StringExpression) – List of labels for x and y values, used to assign each point a label (e.g. population)
  • title (str) – Title of the scatterplot.
  • xlabel (str) – X-axis label.
  • ylabel (str) – Y-axis label.
  • size (int) – Size of markers in screen space units.
  • legend (bool) – Whether or not to show the legend in the resulting figure.
  • collect_all (bool) – Whether to collect all values or downsample before plotting. This parameter will be ignored if x and y are Python objects.
  • n_divisions (int) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints.
  • source_fields (Dict[str, List[Any]]) – Extra fields for the ColumnDataSource of the plot.
Returns:

bokeh.plotting.figure.Figure

hail.plot.qq(pvals, collect_all=False, n_divisions=500)[source]

Create a Quantile-Quantile plot. (https://en.wikipedia.org/wiki/Q-Q_plot)

Parameters:
  • pvals (List[float] or Float64Expression) – P-values to be plotted.
  • collect_all (bool) – Whether to collect all values or downsample before plotting. This parameter will be ignored if pvals is a Python object.
  • n_divisions (int) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints.
Returns:

bokeh.plotting.figure.Figure

hail.plot.manhattan(pvals, locus=None, title=None, size=4, hover_fields=None, collect_all=False, n_divisions=500, significance_line=5e-08)[source]

Create a Manhattan plot. (https://en.wikipedia.org/wiki/Manhattan_plot)

Parameters:
  • pvals (Float64Expression) – P-values to be plotted.
  • locus (LocusExpression) – Locus values to be plotted.
  • title (str) – Title of the plot.
  • size (int) – Size of markers in screen space units.
  • hover_fields (Dict[str, Expression]) – Dictionary of field names and values to be shown in the HoverTool of the plot.
  • collect_all (bool) – Whether to collect all values or downsample before plotting.
  • n_divisions (int) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints.
  • significance_line (float, optional) – p-value at which to add a horizontal, dotted red line indicating genome-wide significance. If None, no line is added.
Returns:

bokeh.plotting.figure.Figure