hailtop.fs Python API
This is the API documentation for Hail’s cloud-agnostic file system implementation in hailtop.fs
.
Use import hailtop.fs as hfs
to access this functionality.
Top-Level Functions
|
Copy a file between filesystems. |
|
Returns |
|
Returns |
|
Returns |
|
Returns information about files at path. |
|
Ensure files can be created whose dirname is path. |
|
Open a file from the local filesystem of from blob storage. |
|
Removes the file at path. |
|
Recursively remove all files under the given path. |
|
Returns information about the file or directory at a given path. |
- hailtop.fs.copy(src, dest, *, requester_pays_config=None)[source]
Copy a file between filesystems. Filesystems can be local filesystem or the blob storage providers GCS, S3 and ABS.
Examples
Copy a file from Google Cloud Storage to a local file:
>>> hfs.copy('gs://hail-common/LCR.interval_list', ... 'file:///mnt/data/LCR.interval_list')
Notes
If you are copying a file just to then load it into Python, you can use
open()
instead. For example:>>> with hfs.open('gs://my_bucket/results.csv', 'r') as f: ... df = pandas_df.read_csv(f)
The provided source and destination file paths must be URIs (uniform resource identifiers) or local filesystem paths.
- hailtop.fs.is_dir(path, *, requester_pays_config=None)[source]
Returns
True
if path both exists and is a directory.
- hailtop.fs.is_file(path, *, requester_pays_config=None)[source]
Returns
True
if path both exists and is a file.
- hailtop.fs.ls(path, *, requester_pays_config=None)[source]
Returns information about files at path.
Notes
Raises an error if path does not exist.
If path is a file, returns a list with one element. If path is a directory, returns an element for each file contained in path (does not search recursively).
Each dict element of the result list contains the following data:
- hailtop.fs.mkdir(path, *, requester_pays_config=None)[source]
Ensure files can be created whose dirname is path.
Warning
On file systems without a notion of directories, this function will do nothing. For example, on Google Cloud Storage, this operation does nothing.
- hailtop.fs.open(path, mode='r', buffer_size=8192, *, requester_pays_config=None)[source]
Open a file from the local filesystem of from blob storage. Supported blob storage providers are GCS, S3 and ABS.
Examples
Write a Pandas DataFrame as a CSV directly into Google Cloud Storage:
>>> with hfs.open('gs://my-bucket/df.csv', 'w') as f: ... pandas_df.to_csv(f)
Read and print the lines of a text file stored in Google Cloud Storage:
>>> with hfs.open('gs://my-bucket/notes.txt') as f: ... for line in f: ... print(line.strip())
Access a text file stored in a Requester Pays Bucket in Google Cloud Storage:
>>> with hfs.open( ... 'gs://my-bucket/notes.txt', ... requester_pays_config='my-project' ... ) as f: ... for line in f: ... print(line.strip())
Specify multiple Requester Pays Buckets within a project that are acceptable to access:
>>> with hfs.open( ... 'gs://my-bucket/notes.txt', ... requester_pays_config=('my-project', ['my-bucket', 'bucket-2']) ... ) as f: ... for line in f: ... print(line.strip())
Write two lines directly to a file in Google Cloud Storage:
>>> with hfs.open('gs://my-bucket/notes.txt', 'w') as f: ... f.write('result1: %s\n' % result1) ... f.write('result2: %s\n' % result2)
Unpack a packed Python struct directly from a file in Google Cloud Storage:
>>> from struct import unpack >>> with hfs.open('gs://my-bucket/notes.txt', 'rb') as f: ... print(unpack('<f', bytearray(f.read())))
Notes
The supported modes are:
'r'
– Readable text file (io.TextIOWrapper
). Default behavior.'w'
– Writable text file (io.TextIOWrapper
).'x'
– Exclusive writable text file (io.TextIOWrapper
). Throws an error if a file already exists at the path.'rb'
– Readable binary file (io.BufferedReader
).'wb'
– Writable binary file (io.BufferedWriter
).'xb'
– Exclusive writable binary file (io.BufferedWriter
). Throws an error if a file already exists at the path.
The provided destination file path must be a URI (uniform resource identifier) or a path on the local filesystem.
- hailtop.fs.remove(path, *, requester_pays_config=None)[source]
Removes the file at path. If the file does not exist, this function does nothing. path must be a URI (uniform resource identifier) or a path on the local filesystem.
- Parameters:
path (
str
)
- hailtop.fs.rmtree(path, *, requester_pays_config=None)[source]
Recursively remove all files under the given path. On a local filesystem, this removes the directory tree at path. On blob storage providers such as GCS, S3 and ABS, this removes all files whose name starts with path. As such, path must be a URI (uniform resource identifier) or a path on the local filesystem.
- Parameters:
path (
str
)
- hailtop.fs.stat(path, *, requester_pays_config=None)[source]
Returns information about the file or directory at a given path.
Notes
Raises an error if path does not exist.
The resulting dictionary contains the following data: