biocantor.gene.gene

Module Contents

Classes

GeneInterval

A GeneInterval is a collection of TranscriptInterval for a specific locus.

class biocantor.gene.gene.GeneInterval(transcripts: List[inscripta.biocantor.gene.transcript.TranscriptInterval], guid: Optional[uuid.UUID] = None, gene_id: Optional[str] = None, gene_symbol: Optional[str] = None, gene_type: Optional[inscripta.biocantor.gene.transcript.Biotype] = None, locus_tag: Optional[str] = None, qualifiers: Optional[Dict[Hashable, List[inscripta.biocantor.gene.interval.QualifierValue]]] = None, sequence_name: Optional[str] = None, sequence_guid: Optional[uuid.UUID] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.Parent] = None)

Bases: inscripta.biocantor.gene.interval.AbstractFeatureIntervalCollection

A GeneInterval is a collection of TranscriptInterval for a specific locus.

This is a traditional gene model. By this, I mean that there is one continuous region that defines the gene. This region then contains 1 to N subregions that are transcripts. These transcripts may or may not be coding, and there is no requirement that all transcripts have the same type. Each transcript consists of one or more intervals, and can exist on either strand. There is no requirement that every transcript exist on the same strand.

The Strand of this gene interval is always the plus strand.

This cannot be empty; it must have at least one transcript.

If a primary_transcript is not provided, then it is inferred by the hierarchy of longest CDS followed by longest isoform.

property is_coding: bool

One or more coding isoforms?

property id: str

Returns the ID of this gene. Provides a shared API across genes/transcripts and features.

property name: str

Returns the name of this gene. Provides a shared API across genes/transcripts and features.

property children_guids

Get all of the GUIDs for children.

Returns: A set of UUIDs

interval_type
_identifiers = ['gene_id', 'gene_symbol', 'locus_tag']
__repr__()

Return repr(self).

iter_children() Iterable[inscripta.biocantor.gene.transcript.TranscriptInterval]

Iterate over the children

to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]

Convert to a dict usable by GeneIntervalModel.

static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.Parent] = None) GeneInterval

Build a GeneInterval from a dictionary representation

get_primary_transcript() Union[inscripta.biocantor.gene.transcript.TranscriptInterval, None]

Get the primary transcript, if it exists.

get_primary_cds() Union[inscripta.biocantor.gene.transcript.CDSInterval, None]

Get the CDS of the primary transcript, if it exists.

get_primary_transcript_sequence() Union[inscripta.biocantor.sequence.Sequence, None]

Get the sequence of the primary transcript, if it exists.

get_primary_feature() Union[inscripta.biocantor.gene.transcript.TranscriptInterval, None]

Convenience function that provides shared API between features and transcripts

get_primary_feature_sequence() Union[inscripta.biocantor.sequence.Sequence, None]

Convenience function that provides shared API between features and transcripts

get_primary_cds_sequence() Union[inscripta.biocantor.sequence.Sequence, None]

Get the sequence of the primary transcript, if it exists.

get_primary_protein() Union[inscripta.biocantor.sequence.Sequence, None]

Get the protein sequence of the primary transcript, if it exists.

_produce_merged_feature(intervals: List[inscripta.biocantor.location.Location]) inscripta.biocantor.gene.feature.FeatureInterval

Wrapper function used by both GeneInterval.get_merged_transcript() and GeneInterval.get_merged_cds().

get_merged_feature() inscripta.biocantor.gene.feature.FeatureInterval

Generate a single FeatureInterval that merges all child features together.

This inherently has no translation and so is returned as a generic feature, not a transcript.

get_merged_transcript() inscripta.biocantor.gene.feature.FeatureInterval

Generate a single FeatureInterval that merges all child features together.

This inherently has no translation and so is returned as a generic feature, not a transcript.

get_merged_cds() inscripta.biocantor.gene.feature.FeatureInterval

Generate a single FeatureInterval that merges all CDS intervals.

export_qualifiers() Dict[Hashable, Set[str]]

Exports qualifiers for GFF3/GenBank export

query_by_guids(id_or_ids: Union[uuid.UUID, List[uuid.UUID]]) Optional[GeneInterval]

Filter this gene interval object by a list of unique IDs.

Parameters

id_or_ids – List of GUIDs, or unique IDs. Can also be a single ID.

Returns

GeneInterval, or None if there are no matching guids.

to_gff(chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]

Produces iterable of GFFRow for this gene and its children.

Parameters
  • chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.

  • raise_on_reserved_attributes – If True, then GFF3 reserved attributes such as ID and Name present in the qualifiers will lead to an exception and not a warning.

Yields

GFFRow

Raises
  • NoSuchAncestorException – If chromosome_relative_coordinates is False but there is no

  • sequence_chunk` ancestor type

incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) GeneInterval

Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new GeneInterval with those changes incorporated on every child.