`biocantor.gene.gene`

Module Contents

Classes

GeneInterval

A GeneInterval is a collection of TranscriptInterval for a specific locus.

class biocantor.gene.gene.GeneInterval(transcripts: List[inscripta.biocantor.gene.transcript.TranscriptInterval], guid: Optional[uuid.UUID] = None, gene_id: Optional[str] = None, gene_symbol: Optional[str] = None, gene_type: Optional[inscripta.biocantor.gene.transcript.Biotype] = None, locus_tag: Optional[str] = None, qualifiers: Optional[Dict[Hashable, List[inscripta.biocantor.gene.interval.QualifierValue]]] = None, sequence_name: Optional[str] = None, sequence_guid: Optional[uuid.UUID] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.Parent] = None)

Bases: inscripta.biocantor.gene.interval.AbstractFeatureIntervalCollection

A GeneInterval is a collection of TranscriptInterval for a specific locus.

This is a traditional gene model. By this, I mean that there is one continuous region that defines the gene. This region then contains 1 to N subregions that are transcripts. These transcripts may or may not be coding, and there is no requirement that all transcripts have the same type. Each transcript consists of one or more intervals, and can exist on either strand. There is no requirement that every transcript exist on the same strand.

The Strand of this gene interval is always the plus strand.

This cannot be empty; it must have at least one transcript.

If a primary_transcript is not provided, then it is inferred by the hierarchy of longest CDS followed by longest isoform.

property is_coding: bool: One or more coding isoforms?

property id: str: Returns the ID of this gene. Provides a shared API across genes/transcripts and features.

property name: str: Returns the name of this gene. Provides a shared API across genes/transcripts and features.

property children_guids

Get all of the GUIDs for children.

Returns: A set of UUIDs

interval_type

_identifiers = ['gene_id', 'gene_symbol', 'locus_tag']

__repr__(): Return repr(self).

iter_children() → Iterable[inscripta.biocantor.gene.transcript.TranscriptInterval]: Iterate over the children

to_dict(chromosome_relative_coordinates: bool = True) → Dict[str, Any]: Convert to a dict usable by GeneIntervalModel.

static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.Parent] = None) → GeneInterval: Build a GeneInterval from a dictionary representation

get_primary_transcript() → Union[inscripta.biocantor.gene.transcript.TranscriptInterval, None]: Get the primary transcript, if it exists.

get_primary_cds() → Union[inscripta.biocantor.gene.transcript.CDSInterval, None]: Get the CDS of the primary transcript, if it exists.

get_primary_transcript_sequence() → Union[inscripta.biocantor.sequence.Sequence, None]: Get the sequence of the primary transcript, if it exists.

get_primary_feature() → Union[inscripta.biocantor.gene.transcript.TranscriptInterval, None]: Convenience function that provides shared API between features and transcripts

get_primary_feature_sequence() → Union[inscripta.biocantor.sequence.Sequence, None]: Convenience function that provides shared API between features and transcripts

get_primary_cds_sequence() → Union[inscripta.biocantor.sequence.Sequence, None]: Get the sequence of the primary transcript, if it exists.

get_primary_protein() → Union[inscripta.biocantor.sequence.Sequence, None]: Get the protein sequence of the primary transcript, if it exists.

_produce_merged_feature(intervals: List[inscripta.biocantor.location.Location]) → inscripta.biocantor.gene.feature.FeatureInterval: Wrapper function used by both GeneInterval.get_merged_transcript() and GeneInterval.get_merged_cds().

get_merged_feature() → inscripta.biocantor.gene.feature.FeatureInterval

Generate a single FeatureInterval that merges all child features together.

This inherently has no translation and so is returned as a generic feature, not a transcript.

get_merged_transcript() → inscripta.biocantor.gene.feature.FeatureInterval

Generate a single FeatureInterval that merges all child features together.

This inherently has no translation and so is returned as a generic feature, not a transcript.

get_merged_cds() → inscripta.biocantor.gene.feature.FeatureInterval: Generate a single FeatureInterval that merges all CDS intervals.

export_qualifiers() → Dict[Hashable, Set[str]]: Exports qualifiers for GFF3/GenBank export

query_by_guids(id_or_ids: Union[uuid.UUID, List[uuid.UUID]]) → Optional[GeneInterval]

Filter this gene interval object by a list of unique IDs.

Parameters: id_or_ids – List of GUIDs, or unique IDs. Can also be a single ID.
Returns: GeneInterval, or None if there are no matching guids.

to_gff(chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) → Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]

Produces iterable of GFFRow for this gene and its children.

Parameters

chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.
raise_on_reserved_attributes – If True, then GFF3 reserved attributes such as ID and Name present in the qualifiers will lead to an exception and not a warning.

Yields

GFFRow

Raises

NoSuchAncestorException – If chromosome_relative_coordinates is False but there is no
sequence_chunk` ancestor type –

incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) → GeneInterval: Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new GeneInterval with those changes incorporated on every child.

biocantor.gene.gene

Module Contents

Classes

`biocantor.gene.gene`