biocantor.gene.gene
Module Contents
Classes
A GeneInterval is a collection of |
- class biocantor.gene.gene.GeneInterval(transcripts: List[inscripta.biocantor.gene.transcript.TranscriptInterval], guid: Optional[uuid.UUID] = None, gene_id: Optional[str] = None, gene_symbol: Optional[str] = None, gene_type: Optional[inscripta.biocantor.gene.transcript.Biotype] = None, locus_tag: Optional[str] = None, qualifiers: Optional[Dict[Hashable, List[inscripta.biocantor.gene.interval.QualifierValue]]] = None, sequence_name: Optional[str] = None, sequence_guid: Optional[uuid.UUID] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.Parent] = None)
Bases:
inscripta.biocantor.gene.interval.AbstractFeatureIntervalCollection
A GeneInterval is a collection of
TranscriptInterval
for a specific locus.This is a traditional gene model. By this, I mean that there is one continuous region that defines the gene. This region then contains 1 to N subregions that are transcripts. These transcripts may or may not be coding, and there is no requirement that all transcripts have the same type. Each transcript consists of one or more intervals, and can exist on either strand. There is no requirement that every transcript exist on the same strand.
The
Strand
of this gene interval is always the plus strand.This cannot be empty; it must have at least one transcript.
If a
primary_transcript
is not provided, then it is inferred by the hierarchy of longest CDS followed by longest isoform.- property id: str
Returns the ID of this gene. Provides a shared API across genes/transcripts and features.
- property name: str
Returns the name of this gene. Provides a shared API across genes/transcripts and features.
- property children_guids
Get all of the GUIDs for children.
Returns: A set of UUIDs
- interval_type
- _identifiers = ['gene_id', 'gene_symbol', 'locus_tag']
- __repr__()
Return repr(self).
- iter_children() Iterable[inscripta.biocantor.gene.transcript.TranscriptInterval]
Iterate over the children
- to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]
Convert to a dict usable by
GeneIntervalModel
.
- static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.Parent] = None) GeneInterval
Build a
GeneInterval
from a dictionary representation
- get_primary_transcript() Union[inscripta.biocantor.gene.transcript.TranscriptInterval, None]
Get the primary transcript, if it exists.
- get_primary_cds() Union[inscripta.biocantor.gene.transcript.CDSInterval, None]
Get the CDS of the primary transcript, if it exists.
- get_primary_transcript_sequence() Union[inscripta.biocantor.sequence.Sequence, None]
Get the sequence of the primary transcript, if it exists.
- get_primary_feature() Union[inscripta.biocantor.gene.transcript.TranscriptInterval, None]
Convenience function that provides shared API between features and transcripts
- get_primary_feature_sequence() Union[inscripta.biocantor.sequence.Sequence, None]
Convenience function that provides shared API between features and transcripts
- get_primary_cds_sequence() Union[inscripta.biocantor.sequence.Sequence, None]
Get the sequence of the primary transcript, if it exists.
- get_primary_protein() Union[inscripta.biocantor.sequence.Sequence, None]
Get the protein sequence of the primary transcript, if it exists.
- _produce_merged_feature(intervals: List[inscripta.biocantor.location.Location]) inscripta.biocantor.gene.feature.FeatureInterval
Wrapper function used by both
GeneInterval.get_merged_transcript()
andGeneInterval.get_merged_cds()
.
- get_merged_feature() inscripta.biocantor.gene.feature.FeatureInterval
Generate a single
FeatureInterval
that merges all child features together.This inherently has no translation and so is returned as a generic feature, not a transcript.
- get_merged_transcript() inscripta.biocantor.gene.feature.FeatureInterval
Generate a single
FeatureInterval
that merges all child features together.This inherently has no translation and so is returned as a generic feature, not a transcript.
- get_merged_cds() inscripta.biocantor.gene.feature.FeatureInterval
Generate a single
FeatureInterval
that merges all CDS intervals.
- query_by_guids(id_or_ids: Union[uuid.UUID, List[uuid.UUID]]) Optional[GeneInterval]
Filter this gene interval object by a list of unique IDs.
- Parameters
id_or_ids – List of GUIDs, or unique IDs. Can also be a single ID.
- Returns
GeneInterval
, or None if there are no matching guids.
- to_gff(chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]
Produces iterable of
GFFRow
for this gene and its children.- Parameters
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.raise_on_reserved_attributes – If
True
, then GFF3 reserved attributes such asID
andName
present in the qualifiers will lead to an exception and not a warning.
- Yields
- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
- incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) GeneInterval
Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new GeneInterval with those changes incorporated on every child.