ReferenceGenome
- class hail.genetics.ReferenceGenome[source]
- Bases: - object- An object that represents a reference genome. - Examples - >>> contigs = ["1", "X", "Y", "MT"] >>> lengths = {"1": 249250621, "X": 155270560, "Y": 59373566, "MT": 16569} >>> par = [("X", 60001, 2699521)] >>> my_ref = hl.ReferenceGenome("my_ref", contigs, lengths, "X", "Y", "MT", par) - Notes - Hail comes with predefined reference genomes (case sensitive!): - GRCh37, Genome Reference Consortium Human Build 37 
- GRCh38, Genome Reference Consortium Human Build 38 
- GRCm38, Genome Reference Consortium Mouse Build 38 
- CanFam3, Canis lupus familiaris (dog) 
 - You can access these reference genome objects using - get_reference():- >>> rg = hl.get_reference('GRCh37') >>> rg = hl.get_reference('GRCh38') >>> rg = hl.get_reference('GRCm38') >>> rg = hl.get_reference('CanFam3') - Note that constructing a new reference genome, either by using the class constructor or by using read will add the reference genome to the list of known references; it is possible to access the reference genome using - get_reference()anytime afterwards.- Note - Reference genome names must be unique. It is not possible to overwrite the built-in reference genomes. - Note - Hail allows setting a default reference so that the - reference_genomeargument of- import_vcf()does not need to be used constantly. It is a current limitation of Hail that a custom reference genome cannot be used as the- default_referenceargument of- init(). In order to set a custom reference genome as default, pass the reference as an argument to- default_reference()after initializing Hail.- Parameters:
- name ( - str) – Name of reference. Must be unique and NOT one of Hail’s predefined references:- 'GRCh37',- 'GRCh38',- 'GRCm38',- 'CanFam3'and- 'default'.
- lengths ( - dictof- strto- int) – Dict of contig names to contig lengths.
- x_contigs ( - stror- listof- str) – Contigs to be treated as X chromosomes.
- y_contigs ( - stror- listof- str) – Contigs to be treated as Y chromosomes.
- mt_contigs ( - stror- listof- str) – Contigs to be treated as mitochondrial DNA.
- par ( - listof- tupleof (str, int, int)) – List of tuples with (contig, start, end)
 
 - Attributes - Contig names. - Get a dictionary mapping contig names to their global genomic positions. - Dict of contig name to contig length. - Mitochondrial contigs. - Name of reference genome. - Pseudoautosomal regions. - X contigs. - Y contigs. - Methods - Register a chain file for liftover. - Load the reference sequence from a FASTA file. - Contig length. - Create reference genome from a FASTA file. - Trueif a liftover chain file is available from this reference genome to the destination reference.- True if the reference sequence has been loaded. - " - Load reference genome from a JSON file. - Remove liftover to dest_reference_genome. - Remove the reference sequence. - "Write this reference genome to a file in JSON format. - add_liftover(chain_file, dest_reference_genome)[source]
- Register a chain file for liftover. - Examples - Access GRCh37 and GRCh38 using - get_reference():- >>> rg37 = hl.get_reference('GRCh37') >>> rg38 = hl.get_reference('GRCh38') - Add a chain file from 37 to 38: - >>> rg37.add_liftover('gs://hail-common/references/grch37_to_grch38.over.chain.gz', rg38) - Notes - This method can only be run once per reference genome. Use - has_liftover()to test whether a chain file has been registered.- The chain file format is described here. - Chain files are hosted on google cloud for some of Hail’s built-in references: - GRCh37 to GRCh38 gs://hail-common/references/grch37_to_grch38.over.chain.gz - GRCh38 to GRCh37 gs://hail-common/references/grch38_to_grch37.over.chain.gz - Public download links are available here. - Parameters:
- chain_file ( - str) – Path to chain file. Can be compressed (GZIP) or uncompressed.
- dest_reference_genome ( - stror- ReferenceGenome) – Reference genome to convert to.
 
 
 - add_sequence(fasta_file, index_file=None)[source]
- Load the reference sequence from a FASTA file. - Examples - Access the GRCh37 reference genome using - get_reference():- >>> rg = hl.get_reference('GRCh37') - Add a sequence file: - >>> rg.add_sequence('gs://hail-common/references/human_g1k_v37.fasta.gz', ... 'gs://hail-common/references/human_g1k_v37.fasta.fai') - Add a sequence file with the default index location: - >>> rg.add_sequence('gs://hail-common/references/human_g1k_v37.fasta.gz') - Notes - This method can only be run once per reference genome. Use - has_sequence()to test whether a sequence is loaded.- FASTA and index files are hosted on google cloud for some of Hail’s built-in references: - GRCh37 - FASTA file: - gs://hail-common/references/human_g1k_v37.fasta.gz
- Index file: - gs://hail-common/references/human_g1k_v37.fasta.fai
 - GRCh38 - FASTA file: - gs://hail-common/references/Homo_sapiens_assembly38.fasta.gz
- Index file: - gs://hail-common/references/Homo_sapiens_assembly38.fasta.fai
 - Public download links are available here. 
 - classmethod from_fasta_file(name, fasta_file, index_file, x_contigs=[], y_contigs=[], mt_contigs=[], par=[])[source]
- Create reference genome from a FASTA file. - Parameters:
- name ( - str) – Name for new reference genome.
- fasta_file ( - str) – Path to FASTA file. Can be compressed (GZIP) or uncompressed.
- index_file ( - str) – Path to FASTA index file. Must be uncompressed.
- x_contigs ( - stror- listof- str) – Contigs to be treated as X chromosomes.
- y_contigs ( - stror- listof- str) – Contigs to be treated as Y chromosomes.
- mt_contigs ( - stror- listof- str) – Contigs to be treated as mitochondrial DNA.
- par ( - listof- tupleof (str, int, int)) – List of tuples with (contig, start, end)
 
- Returns:
 
 - property global_positions_dict
- Get a dictionary mapping contig names to their global genomic positions. - Returns:
- dict– A dictionary of contig names to global genomic positions.
 
 - has_liftover(dest_reference_genome)[source]
- Trueif a liftover chain file is available from this reference genome to the destination reference.- Parameters:
- dest_reference_genome ( - stror- ReferenceGenome)
- Returns:
 
 - locus_from_global_position(global_pos)[source]
- ” Constructs a locus from a global position in reference genome. The inverse of - Locus.position().- Examples - >>> rg = hl.get_reference('GRCh37') >>> rg.locus_from_global_position(0) Locus(contig=1, position=1, reference_genome=GRCh37) - >>> rg.locus_from_global_position(2824183054) Locus(contig=21, position=42584230, reference_genome=GRCh37) - >>> rg = hl.get_reference('GRCh38') >>> rg.locus_from_global_position(2824183054) Locus(contig=chr22, position=1, reference_genome=GRCh38) - Parameters:
- global_pos (int) – Zero-based global base position along the reference genome. 
- Returns:
 
 - classmethod read(path)[source]
- Load reference genome from a JSON file. - Notes - The JSON file must have the following format: - {"name": "my_reference_genome", "contigs": [{"name": "1", "length": 10000000}, {"name": "2", "length": 20000000}, {"name": "X", "length": 19856300}, {"name": "Y", "length": 78140000}, {"name": "MT", "length": 532}], "xContigs": ["X"], "yContigs": ["Y"], "mtContigs": ["MT"], "par": [{"start": {"contig": "X","position": 60001},"end": {"contig": "X","position": 2699521}}, {"start": {"contig": "Y","position": 10001},"end": {"contig": "Y","position": 2649521}}] }- name must be unique and not overlap with Hail’s pre-instantiated references: - 'GRCh37',- 'GRCh38',- 'GRCm38',- 'CanFam3', and- 'default'. The contig names in xContigs, yContigs, and mtContigs must be present in contigs. The intervals listed in par must have contigs in either xContigs or yContigs and must have positions between 0 and the contig length given in contigs.- Parameters:
- path ( - str) – Path to JSON file.
- Returns:
 
 - remove_liftover(dest_reference_genome)[source]
- Remove liftover to dest_reference_genome. - Parameters:
- dest_reference_genome ( - stror- ReferenceGenome)
 
 - write(output)[source]
- “Write this reference genome to a file in JSON format. - Examples - >>> my_rg = hl.ReferenceGenome("new_reference", ["x", "y", "z"], {"x": 500, "y": 300, "z": 200}) >>> my_rg.write(f"output/new_reference.json") - Notes - Use - read()to reimport the exported reference genome in a new HailContext session.- Parameters:
- output ( - str) – Path of JSON file to write.