biocantor.gene.variants
This module contains VariantInterval
, which models diploid sequence variation.
This model is intended to be as simple as possible, and represent a single alternative haplotype. Variants are always represented on the positive strand, and is loosely modeled after VCF files.
Variants must be represented by intervals of at least length 1 in all cases. It is better to left-pad indels, but it is not required.
Here are examples of how variants can be represented.
A SNV:
# ref: ACTCTCTCTATCTCATCCAC
# alt: AGTCTCTCTATCTCATCCAC
snp_1 = VariantInterval(start=1, end=2, sequence="G", variant_type="SNV")
A GG insertion at position 5
# ref: ACTCT CTCTATCTCATCCAC
# alt: ACTCTGGCTCTATCTCATCCAC
insertion_5 = VariantInterval(start=5, end=6, sequence="GGC", variant_type="insertion")
A left-padded deletion from 10 to 13:
# ref: ACTCTCTCTATCTCATCCAC
# alt: ACTCTCTCTAT CATCCAC
deletion_11_13 = VariantInterval(start=10, end=13, sequence="T", variant_type="deletion")
A deletion from 13 to 15 without padding:
# ref: ACTCTCTCTATCTCATCCAC
# alt: ACTCTCTCTATCT TCCAC
deletion_13_15 = VariantInterval(start=13, end=15, sequence="", variant_type="deletion")
Module Contents
Classes
This is a wrapper over |
|
A container for many |
- class biocantor.gene.variants.VariantInterval(start: int, end: int, sequence: str, variant_type: str, phase_block: Optional[int] = None, guid: Optional[uuid.UUID] = None, variant_guid: Optional[uuid.UUID] = None, variant_name: Optional[str] = None, variant_id: Optional[str] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.location.Parent] = None)
Bases:
inscripta.biocantor.gene.interval.AbstractFeatureInterval
This is a wrapper over
AbstractInterval
that adds functions shared acrossTranscriptInterval
,FeatureInterval
, andVariantInterval
.- property id: str
Returns the ID of this feature. Provides a shared API across genes/transcripts and features.
- property name: str
Returns the name of this feature. Provides a shared API across genes/transcripts and features.
- property alternative_genomic_sequence: inscripta.biocantor.sequence.sequence.Sequence
Edited version of the original genomic sequence
- property parent_with_alternative_sequence: inscripta.biocantor.location.Parent
- property length_difference
- _identifiers = ['variant_name', 'variant_id']
- __str__()
Return str(self).
- __repr__()
Return repr(self).
- export_qualifiers(parent_qualifiers: Optional[Dict[Hashable, Set[Hashable]]] = None) Dict[Hashable, Set[str]]
Exports qualifiers for GFF3 or GenBank export. This merges top level keys with the arbitrary values
- abstract to_bed12(score: Optional[int] = 0, rgb: Optional[inscripta.biocantor.io.bed.RGB] = RGB(0, 0, 0), name: Optional[str] = 'feature_name', chromosome_relative_coordinates: bool = True) inscripta.biocantor.io.bed.BED12
Write a BED12 format representation of this
AbstractFeatureInterval
.Both of these optional arguments are specific to the BED12 format.
- Parameters
score – An optional score associated with a interval. UCSC requires an integer between 0 and 1000.
rgb – An optional RGB string for visualization on a browser. This allows you to have multiple colors on a single UCSC track.
name – Which identifier in this record to use as ‘name’. feature_name to guid. If the supplied string is not a valid attribute, it is used directly.
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.
- Returns
A
BED12
object.- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
- abstract to_gff(parent: Optional[str] = None, parent_qualifiers: Optional[Dict] = None, chromosome_relative_coordinates: bool = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]
Writes a GFF format list of lists for this feature.
The additional qualifiers are used when writing a hierarchical relationship back to files. GFF files are easier to work with if the children features have the qualifiers of their parents.
- Parameters
parent – ID of the Parent of this transcript.
parent_qualifiers – Directly pull qualifiers in from this dictionary.
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.
- Yields
- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
- abstract to_vcf()
- to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]
Dictionary to build Model representation. Defaults to always exporting in original chromosome relative coordinates, but this can be disabled to export in sequence-chunk relative coordinates.
If you have exported to sequence-chunk relative coordinates, and then try to re-instantiate, the subsequent object will now consider these new coordinates to be the original chromosome coordinates, and the relationship back to the true coordinates will be lost.
- static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.location.Parent] = None) VariantInterval
Build an interval from a dictionary representation
- lift_over_location(location: inscripta.biocantor.location.Location) inscripta.biocantor.location.Location
Construct a new Location that takes the alternative sequence defined by this VariantInterval into account.
The Location can be chunk-relative or chromosome-relative. It will be returned relative to the coordinate system of this VariantInterval.
- _lift_over_chromosome_location_single_interval(location: inscripta.biocantor.location.SingleInterval) inscripta.biocantor.location.Location
- _lift_over_chromosome_location_compound_interval(location: inscripta.biocantor.location.CompoundInterval) inscripta.biocantor.location.Location
- class biocantor.gene.variants.VariantIntervalCollection(variant_intervals: List[VariantInterval], variant_collection_name: Optional[str] = None, variant_collection_id: Optional[str] = None, sequence_name: Optional[str] = None, sequence_guid: Optional[uuid.UUID] = None, guid: Optional[uuid.UUID] = None, qualifiers: Optional[Dict[Hashable, List[inscripta.biocantor.gene.interval.QualifierValue]]] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.location.Parent] = None)
Bases:
inscripta.biocantor.gene.interval.AbstractFeatureIntervalCollection
A container for many
VariantInterval
. Assumes that the variants are all on the same haplotype.- property id: str
Returns the ID of this feature. Provides a shared API across genes/transcripts and features.
- property name: str
Returns the name of this feature. Provides a shared API across genes/transcripts and features.
- property alternative_genomic_sequence: inscripta.biocantor.sequence.sequence.Sequence
Edited version of the original sequence
- property parent_with_alternative_sequence: inscripta.biocantor.location.Parent
- interval_type
- _identifiers = ['variant_collection_name', 'variant_collection_id']
- __repr__()
Return repr(self).
- iter_children() Iterable[inscripta.biocantor.gene.interval.AbstractInterval]
Iterate over the children
- query_by_guids(id_or_ids: Union[uuid.UUID, List[uuid.UUID]]) VariantIntervalCollection
Filter this collection object by a list of unique IDs.
- Parameters
id_or_ids – List of GUIDs, or unique IDs. Can also be a single ID.
- to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]
Convert to a dict usable by
VariantIntervalCollectionModel
.
- static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.location.Parent] = None) VariantIntervalCollection
Build a
VariantIntervalCollection
from a dictionary representation
- abstract to_gff(chromosome_relative_coordinates: bool = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]
Writes a GFF format list of lists for this feature.
- Parameters
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.- Yields
- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
- lift_over_location(location: inscripta.biocantor.location.Location) inscripta.biocantor.location.Location
Construct a new Location that takes the alternative sequence defined by this VariantIntervalCollection into account.
The Location can be chunk-relative or chromosome-relative. It will be returned relative to the coordinate system of this VariantInterval.