hailtop.fs Python API

This is the API documentation for Hail’s cloud-agnostic file system implementation in hailtop.fs.

Use import hailtop.fs as hfs to access this functionality.

Top-Level Functions

copy(src, dest, *[, requester_pays_config])

Copy a file between filesystems.

exists(path, *[, requester_pays_config])

Returns True if path exists.

is_dir(path, *[, requester_pays_config])

Returns True if path both exists and is a directory.

is_file(path, *[, requester_pays_config])

Returns True if path both exists and is a file.

ls(path, *[, requester_pays_config])

Returns information about files at path.

mkdir(path, *[, requester_pays_config])

Ensure files can be created whose dirname is path.

open(path[, mode, buffer_size, ...])

Open a file from the local filesystem of from blob storage.

remove(path, *[, requester_pays_config])

Removes the file at path.

rmtree(path, *[, requester_pays_config])

Recursively remove all files under the given path.

stat(path, *[, requester_pays_config])

Returns information about the file or directory at a given path.

hailtop.fs.copy(src, dest, *, requester_pays_config=None)[source]

Copy a file between filesystems. Filesystems can be local filesystem or the blob storage providers GCS, S3 and ABS.

Examples

Copy a file from Google Cloud Storage to a local file:

>>> hfs.copy('gs://hail-common/LCR.interval_list',
...          'file:///mnt/data/LCR.interval_list') 

Notes

If you are copying a file just to then load it into Python, you can use open() instead. For example:

>>> with hfs.open('gs://my_bucket/results.csv', 'r') as f: 
...     df = pandas_df.read_csv(f)

The provided source and destination file paths must be URIs (uniform resource identifiers) or local filesystem paths.

Parameters:
  • src (str) – Source file URI.

  • dest (str) – Destination file URI.

hailtop.fs.exists(path, *, requester_pays_config=None)[source]

Returns True if path exists.

Parameters:

path (str)

Returns:

bool

hailtop.fs.is_dir(path, *, requester_pays_config=None)[source]

Returns True if path both exists and is a directory.

Parameters:

path (str)

Returns:

bool

hailtop.fs.is_file(path, *, requester_pays_config=None)[source]

Returns True if path both exists and is a file.

Parameters:

path (str)

Returns:

bool

hailtop.fs.ls(path, *, requester_pays_config=None)[source]

Returns information about files at path.

Notes

Raises an error if path does not exist.

If path is a file, returns a list with one element. If path is a directory, returns an element for each file contained in path (does not search recursively).

Each dict element of the result list contains the following data:

  • is_dir (bool) – Path is a directory.

  • size_bytes (int) – Size in bytes.

  • size (str) – Size as a readable string.

  • modification_time (str) – Time of last file modification.

  • owner (str) – Owner.

  • path (str) – Path.

Parameters:

path (str)

Returns:

list [dict]

hailtop.fs.mkdir(path, *, requester_pays_config=None)[source]

Ensure files can be created whose dirname is path.

Warning

On file systems without a notion of directories, this function will do nothing. For example, on Google Cloud Storage, this operation does nothing.

hailtop.fs.open(path, mode='r', buffer_size=8192, *, requester_pays_config=None)[source]

Open a file from the local filesystem of from blob storage. Supported blob storage providers are GCS, S3 and ABS.

Examples

Write a Pandas DataFrame as a CSV directly into Google Cloud Storage:

>>> with hfs.open('gs://my-bucket/df.csv', 'w') as f: 
...     pandas_df.to_csv(f)

Read and print the lines of a text file stored in Google Cloud Storage:

>>> with hfs.open('gs://my-bucket/notes.txt') as f: 
...     for line in f:
...         print(line.strip())

Access a text file stored in a Requester Pays Bucket in Google Cloud Storage:

>>> with hfs.open( 
...     'gs://my-bucket/notes.txt',
...     requester_pays_config='my-project'
... ) as f:
...     for line in f:
...         print(line.strip())

Specify multiple Requester Pays Buckets within a project that are acceptable to access:

>>> with hfs.open( 
...     'gs://my-bucket/notes.txt',
...     requester_pays_config=('my-project', ['my-bucket', 'bucket-2'])
... ) as f:
...     for line in f:
...         print(line.strip())

Write two lines directly to a file in Google Cloud Storage:

>>> with hfs.open('gs://my-bucket/notes.txt', 'w') as f: 
...     f.write('result1: %s\n' % result1)
...     f.write('result2: %s\n' % result2)

Unpack a packed Python struct directly from a file in Google Cloud Storage:

>>> from struct import unpack
>>> with hfs.open('gs://my-bucket/notes.txt', 'rb') as f: 
...     print(unpack('<f', bytearray(f.read())))

Notes

The supported modes are:

The provided destination file path must be a URI (uniform resource identifier) or a path on the local filesystem.

Parameters:
  • path (str) – Path to file.

  • mode (str) – File access mode.

  • buffer_size (int) – Buffer size, in bytes.

Returns:

Readable or writable file handle.

hailtop.fs.remove(path, *, requester_pays_config=None)[source]

Removes the file at path. If the file does not exist, this function does nothing. path must be a URI (uniform resource identifier) or a path on the local filesystem.

Parameters:

path (str)

hailtop.fs.rmtree(path, *, requester_pays_config=None)[source]

Recursively remove all files under the given path. On a local filesystem, this removes the directory tree at path. On blob storage providers such as GCS, S3 and ABS, this removes all files whose name starts with path. As such, path must be a URI (uniform resource identifier) or a path on the local filesystem.

Parameters:

path (str)

hailtop.fs.stat(path, *, requester_pays_config=None)[source]

Returns information about the file or directory at a given path.

Notes

Raises an error if path does not exist.

The resulting dictionary contains the following data:

  • is_dir (bool) – Path is a directory.

  • size_bytes (int) – Size in bytes.

  • size (str) – Size as a readable string.

  • modification_time (str) – Time of last file modification.

  • owner (str) – Owner.

  • path (str) – Path.

Parameters:

path (str)

Returns:

dict