Plot
Warning
Plotting functionality is in early stages and is experimental. Interfaces will change regularly.
Plotting in Hail is easy. Hail’s plot functions utilize Bokeh plotting libraries to create attractive, interactive figures. Plotting functions in this module return a Bokeh Figure, so you can call a method to plot your data and then choose to extend the plot however you like by interacting directly with Bokeh. See the GWAS tutorial for examples.
Plot functions in Hail accept data in the form of either Python objects or Table
and MatrixTable
fields.
Create a cumulative density plot. |
|
Create a density plot. |
|
Create a histogram. |
|
Create a cumulative histogram. |
|
Plot a two-dimensional histogram. |
|
Create an interactive scatter plot. |
|
Create a Quantile-Quantile plot. |
|
Create a Manhattan plot. |
|
Configure the Bokeh output state to generate output in notebook cells when |
|
Visualize missingness in a MatrixTable. |
- hail.plot.cdf(data, k=350, legend=None, title=None, normalize=True, log=False)[source]
Create a cumulative density plot.
- Parameters:
data (
Struct
orFloat64Expression
) – Sequence of data to plot.k (int) – Accuracy parameter (passed to
approx_cdf()
).legend (str) – Label of data on the x-axis.
title (str) – Title of the histogram.
normalize (bool) – Whether or not the cumulative data should be normalized.
log (bool) – Whether or not the y-axis should be of type log.
- Returns:
- hail.plot.pdf(data, k=1000, confidence=5, legend=None, title=None, log=False, interactive=False)[source]
- hail.plot.smoothed_pdf(data, k=350, smoothing=0.5, legend=None, title=None, log=False, interactive=False, figure=None)[source]
Create a density plot.
- Parameters:
data (
Struct
orFloat64Expression
) – Sequence of data to plot.k (int) – Accuracy parameter.
smoothing (float) – Degree of smoothing.
legend (str) – Label of data on the x-axis.
title (str) – Title of the histogram.
log (bool) – Plot the log10 of the bin counts.
interactive (bool) – If True, return a handle to pass to
bokeh.io.show()
.figure (
bokeh.plotting.figure
) – If not None, add density plot to figure. Otherwise, create a new figure.
- Returns:
- hail.plot.histogram(data, range=None, bins=50, legend=None, title=None, log=False, interactive=False)[source]
Create a histogram.
Notes
data can be a
Float64Expression
, or the result of thehist()
orapprox_cdf()
aggregators.- Parameters:
data (
Struct
orFloat64Expression
) – Sequence of data to plot.range (Tuple[float]) – Range of x values in the histogram.
bins (int) – Number of bins in the histogram.
legend (str) – Label of data on the x-axis.
title (str) – Title of the histogram.
log (bool) – Plot the log10 of the bin counts.
- Returns:
- hail.plot.cumulative_histogram(data, range=None, bins=50, legend=None, title=None, normalize=True, log=False)[source]
Create a cumulative histogram.
- Parameters:
data (
Struct
orFloat64Expression
) – Sequence of data to plot.range (Tuple[float]) – Range of x values in the histogram.
bins (int) – Number of bins in the histogram.
legend (str) – Label of data on the x-axis.
title (str) – Title of the histogram.
normalize (bool) – Whether or not the cumulative data should be normalized.
log (bool) – Whether or not the y-axis should be of type log.
- Returns:
- hail.plot.histogram2d(x, y, bins=40, range=None, title=None, width=600, height=600, colors=('#eff3ff', '#c6dbef', '#9ecae1', '#6baed6', '#4292c6', '#2171b5', '#084594'), log=False)[source]
Plot a two-dimensional histogram.
x
andy
must both be aNumericExpression
from the sameTable
.If
x_range
ory_range
are not provided, the function will do a pass through the data to determine min and max of each variable.Examples
>>> ht = hail.utils.range_table(1000).annotate(x=hail.rand_norm(), y=hail.rand_norm()) >>> p_hist = hail.plot.histogram2d(ht.x, ht.y)
>>> ht = hail.utils.range_table(1000).annotate(x=hail.rand_norm(), y=hail.rand_norm()) >>> p_hist = hail.plot.histogram2d(ht.x, ht.y, bins=10, range=((0, 1), None))
- Parameters:
x (
NumericExpression
) – Expression for x-axis (from a Hail table).y (
NumericExpression
) – Expression for y-axis (from the same Hail table asx
).bins (int or [int, int]) – The bin specification: - If int, the number of bins for the two dimensions (nx = ny = bins). - If [int, int], the number of bins in each dimension (nx, ny = bins). The default value is 40.
range (None or ((float, float), (float, float))) – The leftmost and rightmost edges of the bins along each dimension: ((xmin, xmax), (ymin, ymax)). All values outside of this range will be considered outliers and not tallied in the histogram. If this value is None, or either of the inner lists is None, the range will be computed from the data.
width (int) – Plot width (default 600px).
height (int) – Plot height (default 600px).
title (str) – Title of the plot.
colors (Sequence[str]) – List of colors (hex codes, or strings as described here). Compatible with one of the many built-in palettes available here.
log (bool) – Plot the log10 of the bin counts.
- Returns:
- hail.plot.scatter(x, y, label=None, title=None, xlabel=None, ylabel=None, size=4, legend=True, hover_fields=None, colors=None, width=800, height=800, collect_all=None, n_divisions=500, missing_label='NA')[source]
Create an interactive scatter plot.
x
andy
must both be either: - aNumericExpression
from the sameTable
. - a tuple (str,NumericExpression
) from the sameTable
. If passed as a tuple the first element is used as the hover label.If no label or a single label is provided, then returns
bokeh.plotting.figure
Otherwise returns abokeh.models.layouts.Column
containing: - abokeh.models.widgets.inputs.Select
dropdown selection widget for labels - abokeh.plotting.figure
containing the interactive scatter plotPoints will be colored by one of the labels defined in the
label
using the color scheme defined in the corresponding entry ofcolors
if provided (otherwise a default scheme is used). To specify your color mapper, check the bokeh documentation for CategoricalMapper for categorical labels, and for LinearColorMapper and LogColorMapper for continuous labels. For categorical labels, clicking on one of the items in the legend will hide/show all points with the corresponding label. Note that using many different labelling schemes in the same plots, particularly if those labels contain many different classes could slow down the plot interactions.Hovering on points will display their coordinates, labels and any additional fields specified in
hover_fields
.- Parameters:
x (
NumericExpression
or (str,NumericExpression
)) – List of x-values to be plotted.y (
NumericExpression
or (str,NumericExpression
)) – List of y-values to be plotted.label (
Expression
or Dict[str,Expression
]], optional) – Either a single expression (if a single label is desired), or a dictionary of label name -> label value for x and y values. Used to color each point w.r.t its label. When multiple labels are given, a dropdown will be displayed with the different options. Can be used with categorical or continuous expressions.title (str, optional) – Title of the scatterplot.
xlabel (str, optional) – X-axis label.
ylabel (str, optional) – Y-axis label.
size (int) – Size of markers in screen space units.
legend (bool) – Whether or not to show the legend in the resulting figure.
hover_fields (Dict[str,
Expression
], optional) – Extra fields to be displayed when hovering over a point on the plot.colors (
bokeh.models.mappers.ColorMapper
or Dict[str,bokeh.models.mappers.ColorMapper
], optional) – If a single label is used, then this can be a color mapper, if multiple labels are used, then this should be a Dict of label name -> color mapper. Used to set colors for the labels defined usinglabel
. If not used at all, or label names not appearing in this dict will be colored using a default color scheme.width (int) – Plot width
height (int) – Plot height
collect_all (bool, optional) – Deprecated. Use n_divisions instead.
n_divisions (int, optional) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints. Use None to collect all points.
missing_label (str) – Label to use when a point is missing data for a categorical label
- Returns:
bokeh.models.Plot
if no label or a single label was given, otherwisebokeh.models.layouts.Column
- hail.plot.qq(pvals, label=None, title='Q-Q plot', xlabel='Expected -log10(p)', ylabel='Observed -log10(p)', size=6, legend=True, hover_fields=None, colors=None, width=800, height=800, collect_all=None, n_divisions=500, missing_label='NA')[source]
Create a Quantile-Quantile plot. (https://en.wikipedia.org/wiki/Q-Q_plot)
If no label or a single label is provided, then returns
bokeh.plotting.figure
Otherwise returns abokeh.models.layouts.Column
containing: - abokeh.models.widgets.inputs.Select
dropdown selection widget for labels - abokeh.plotting.figure
containing the interactive qq plotPoints will be colored by one of the labels defined in the
label
using the color scheme defined in the corresponding entry ofcolors
if provided (otherwise a default scheme is used). To specify your color mapper, check the bokeh documentation for CategoricalMapper for categorical labels, and for LinearColorMapper and LogColorMapper for continuous labels. For categorical labels, clicking on one of the items in the legend will hide/show all points with the corresponding label. Note that using many different labelling schemes in the same plots, particularly if those labels contain many different classes could slow down the plot interactions.Hovering on points will display their coordinates, labels and any additional fields specified in
hover_fields
.- Parameters:
pvals (
NumericExpression
) – List of x-values to be plotted.label (
Expression
or Dict[str,Expression
]]) – Either a single expression (if a single label is desired), or a dictionary of label name -> label value for x and y values. Used to color each point w.r.t its label. When multiple labels are given, a dropdown will be displayed with the different options. Can be used with categorical or continuous expressions.title (str, optional) – Title of the scatterplot.
xlabel (str, optional) – X-axis label.
ylabel (str, optional) – Y-axis label.
size (int) – Size of markers in screen space units.
legend (bool) – Whether or not to show the legend in the resulting figure.
hover_fields (Dict[str,
Expression
], optional) – Extra fields to be displayed when hovering over a point on the plot.colors (
bokeh.models.mappers.ColorMapper
or Dict[str,bokeh.models.mappers.ColorMapper
], optional) – If a single label is used, then this can be a color mapper, if multiple labels are used, then this should be a Dict of label name -> color mapper. Used to set colors for the labels defined usinglabel
. If not used at all, or label names not appearing in this dict will be colored using a default color scheme.width (int) – Plot width
height (int) – Plot height
collect_all (bool) – Deprecated. Use n_divisions instead.
n_divisions (int, optional) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints. Use None to collect all points.
missing_label (str) – Label to use when a point is missing data for a categorical label
- Returns:
bokeh.plotting.figure
if no label or a single label was given, otherwisebokeh.models.layouts.Column
- hail.plot.manhattan(pvals, locus=None, title=None, size=4, hover_fields=None, collect_all=None, n_divisions=500, significance_line=5e-08)[source]
Create a Manhattan plot. (https://en.wikipedia.org/wiki/Manhattan_plot)
- Parameters:
pvals (
Float64Expression
) – P-values to be plotted.locus (
LocusExpression
, optional) – Locus values to be plotted.title (str, optional) – Title of the plot.
size (int) – Size of markers in screen space units.
hover_fields (Dict[str,
Expression
], optional) – Dictionary of field names and values to be shown in the HoverTool of the plot.collect_all (bool, optional) – Deprecated - use n_divisions instead.
n_divisions (int, optional.) – Factor by which to downsample (default value = 500). A lower input results in fewer output datapoints. Use None to collect all points.
significance_line (float, optional) – p-value at which to add a horizontal, dotted red line indicating genome-wide significance. If
None
, no line is added.
- Returns:
- hail.plot.output_notebook()[source]
Configure the Bokeh output state to generate output in notebook cells when
bokeh.io.show()
is called. Callsbokeh.io.output_notebook()
.
- hail.plot.visualize_missingness(entry_field, row_field=None, column_field=None, window=6000000, plot_width=1800, plot_height=900)[source]
Visualize missingness in a MatrixTable.
Inspired by naniar.
Row field is windowed by default, and missingness is aggregated over this window to generate a proportion defined. This windowing is set to 6,000,000 by default, so that the human genome is divided into ~500 rows. With ~2,000 columns, this function returns a sensibly-sized plot with this windowing.
Warning
Generating a plot with more than ~1M points takes a long time for Bokeh to render. Consider windowing carefully.
- Parameters:
entry_field (
Expression
) – Field for which to check missingness.row_field (
NumericExpression
orLocusExpression
) – Row field to use for y-axis (can be windowed). If not provided, the row key will be used.column_field (
StringExpression
) – Column field to use for x-axis. If not provided, the column key will be used.window (int, optional) – Size of window to summarize by
row_field
. If set to None, each field will be used individually.plot_width (int) – Plot width in px.
plot_height (int) – Plot height in px.
- Returns: