`biocantor.gene.transcript`

Object representation of Transcripts.

Each object is capable of exporting itself to BED and GFF3.

Module Contents

Classes

TranscriptInterval

Transcripts are a collection of Intervals. The canonical transcript has a 5' UTR,

class biocantor.gene.transcript.TranscriptInterval(exon_starts: List[int], exon_ends: List[int], strand: inscripta.biocantor.location.strand.Strand, cds_starts: Optional[List[int]] = None, cds_ends: Optional[List[int]] = None, cds_frames: Optional[List[inscripta.biocantor.gene.cds_frame.CDSFrame]] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)

Bases: inscripta.biocantor.gene.interval.AbstractFeatureInterval

Transcripts are a collection of Intervals. The canonical transcript has a 5’ UTR, a CDS, and a 3’ UTR. However, a transcript may be lacking any of those three features – it could be noncoding (no CDS), truncated (no 3’ UTR or 5’ UTR). It could also be a single exon.

This object represents all of these possibilities through one Location that represents the Exons, and an optional CDS that represents the CDSInterval.

In addition to the interval information, this object has optional metadata and an optional understanding of sequence.

property is_primary_tx: bool: Is this the primary transcript?

property cds_location: inscripta.biocantor.location.location.Location: Returns the Location of the CDS in chromosome coordinates

property cds_chunk_relative_location: inscripta.biocantor.location.location.Location: Returns the Location of the CDS in chunk relative coordinates

property chromosome_intron_location: Returns the Location of the Introns of this Transcript in chromosome coordinates

property chunk_relative_intron_location: Returns the Location of the Introns of this Transcript in chunk relative coordinates

property is_coding: bool

property has_in_frame_stop: bool

property cds_size: int: CDS size, regardless of chunk relativity (does not shrink)

property chunk_relative_cds_size: int: Chunk relative CDS size (can shrink if the Location is a slice of the full transcript)

property cds_start: int

property cds_end: int

property chunk_relative_cds_start: int

property chunk_relative_cds_end: int

property cds_blocks: Iterable[inscripta.biocantor.location.location_impl.SingleInterval]: Wrapper for blocks function that reports blocks in chromosome coordinates

property chunk_relative_cds_blocks: List[inscripta.biocantor.location.location.Location]: Wrapper for blocks function that reports blocks in chunk-relative coordinates

property id: str: Returns the ID of this transcript. Provides a shared API across genes/transcripts and features.

property name: str: Returns the name of this transcript. Provides a shared API across genes/transcripts and features.

interval_type

_identifiers = ['transcript_id', 'transcript_symbol', 'protein_id']

__str__(): Return str(self).

__repr__(): Return repr(self).

__len__()

_reset_parent(parent: Optional[inscripta.biocantor.parent.parent.Parent] = None): Convenience function that wraps location.reset_parent(). Overrides the parent function in AbstractInterval in order to also update the CDS interval.

_liftover_this_location_to_seq_chunk_parent(seq_chunk_parent: inscripta.biocantor.parent.parent.Parent)

Lift this transcript interval to a new subset. Overrides the parent method in order to perform this task on the CDS interval as well.

This could happen as the result of a subsetting operation.

This will introduce chunk-relative coordinates to this interval, or reduce the size of existing chunk-relative coordinates.

This function calls the parent static method AbstractInterval.liftover_location_to_seq_chunk_parent(), but differs in two key ways:

It acts on an instantiated subclass of this abstract class, modifying the location.
It handles the case where a subclass is already a slice, by first lifting up to genomic coordinates.

For these reasons, and particularly #1, this is a private method that is intended to be used during construction of a subclass. Modifying the locations in-place are generally a bad idea after initial construction of a interval class.

to_dict(chromosome_relative_coordinates: bool = True) → Dict[str, Any]: Convert to a dict usable by biocantor.io.models.TranscriptIntervalModel.

static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None) → TranscriptInterval: Build a TranscriptInterval from a dictionary.

static from_location(location: inscripta.biocantor.location.location.Location, cds: Optional[inscripta.biocantor.gene.cds.CDSInterval] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None) → TranscriptInterval

static from_chunk_relative_location(location: inscripta.biocantor.location.location.Location, cds: Optional[inscripta.biocantor.gene.cds.CDSInterval] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None) → TranscriptInterval

Allows construction of a TranscriptInterval from a chunk-relative location. This is a location present on a sequence chunk, which should be built by the convenience function seq_chunk_to_parent:

from inscripta.biocantor.io.parser import seq_chunk_to_parent
parent = seq_chunk_to_parent('AANAAATGGCGAGCACCTAACCCCCNCC', "NC_000913.3", 222213, 222241)
loc = SingleInterval(5, 20, Strand.PLUS, parent=parent)

And then, this can be lifted back to chromosomal coordinates like such:

loc.lift_over_to_first_ancestor_of_type("chromosome")

intersect(location: inscripta.biocantor.location.location.Location, new_guid: Optional[uuid.UUID] = None, new_qualifiers: Optional[dict] = None) → TranscriptInterval

Returns a new TranscriptInterval representing the intersection of this Transcript’s location with the other location.

Strand of the other location is ignored; returned Transcript is on the same strand as this Transcript.

If this Transcript has a CDS, it will be dropped because CDS intersections are not currently supported.

sequence_pos_to_transcript(pos: int) → int: Converts sequence position to relative position along this transcript.

chunk_relative_pos_to_transcript(pos: int) → int: Converts chunk-relative sequence position to relative position along this transcript.

sequence_interval_to_transcript(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) → inscripta.biocantor.location.location.Location: Converts a contiguous interval on the sequence to a relative location within this transcript.

chunk_relative_interval_to_transcript(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) → inscripta.biocantor.location.location.Location: Converts a contiguous interval on the chunk-relative sequence to a relative location within this transcript.

transcript_pos_to_sequence(pos: int) → int: Converts a relative position along this transcript to sequence coordinate.

transcript_pos_to_chunk_relative(pos: int) → int: Converts a relative position along this transcript to chunk-relative sequence coordinate.

transcript_interval_to_sequence(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) → inscripta.biocantor.location.location.Location: Converts a contiguous interval relative to this transcript to a spliced location on the sequence.

transcript_interval_to_chunk_relative(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) → inscripta.biocantor.location.location.Location: Converts a contiguous interval relative to this transcript to a spliced location on the chunk-relative sequence.

cds_pos_to_sequence(pos: int) → int: Converts a relative position along the CDS to sequence coordinate.

cds_pos_to_chunk_relative(pos: int) → int: Converts a relative position along the CDS to chunk-relative sequence coordinate.

cds_interval_to_sequence(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) → inscripta.biocantor.location.location.Location: Converts a contiguous interval relative to the CDS to a spliced location on the sequence.

cds_interval_to_chunk_relative(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) → inscripta.biocantor.location.location.Location: Converts a contiguous interval relative to the CDS to a spliced location on the chunk-relative sequence.

sequence_pos_to_cds(pos: int) → int: Converts sequence position to relative position along the CDS.

chunk_relative_pos_to_cds(pos: int) → int: Converts chunk-relative sequence position to relative position along the CDS.

sequence_interval_to_cds(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) → inscripta.biocantor.location.location.Location: Converts a contiguous interval on the sequence to a relative location within the CDS.

chunk_relative_interval_to_cds(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) → inscripta.biocantor.location.location.Location: Converts a contiguous interval on the chunk-relative sequence to a relative location within the CDS.

cds_pos_to_transcript(pos: int) → int: Converts a relative position along the CDS to a relative position along this transcript.

transcript_pos_to_cds(pos: int) → int: Converts a relative position along this transcript to a relative position along the CDS.

get_5p_interval() → inscripta.biocantor.location.location.Location

Return the 5’ UTR as a Location, if it exists.

WARNING: If this is a chunk-relative transcript, the result of this function will also be chunk-relative.

get_3p_interval() → inscripta.biocantor.location.location.Location

Returns the 3’ UTR as a location, if it exists.

WARNING: If this is a chunk-relative transcript, the result of this function will also be chunk-relative.

get_transcript_sequence() → inscripta.biocantor.sequence.sequence.Sequence: Returns the mRNA sequence.

get_cds_sequence() → inscripta.biocantor.sequence.sequence.Sequence: Returns the in-frame CDS sequence (always multiple of 3).

get_protein_sequence(truncate_at_in_frame_stop: Optional[bool] = False, translation_table: Optional[inscripta.biocantor.gene.codon.TranslationTable] = TranslationTable.DEFAULT) → inscripta.biocantor.sequence.sequence.Sequence: Return the translation of this transcript, if possible.

export_qualifiers(parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None) → Dict[Hashable, Set[Hashable]]: Exports qualifiers for GFF3/GenBank export

to_gff(parent: Optional[str] = None, parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None, chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) → Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]

Writes a GFF format list of lists for this transcript.

The additional qualifiers are used when writing a hierarchical relationship back to files. GFF files are easier to work with if the children features have the qualifiers of their parents.

Parameters

parent – ID of the Parent of this transcript.
parent_qualifiers – Directly pull qualifiers in from this dictionary.
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.
raise_on_reserved_attributes – If True, then GFF3 reserved attributes such as ID and Name present in the qualifiers will lead to an exception and not a warning.

Yields

GFFRow

Raises

NoSuchAncestorException – If chromosome_relative_coordinates is False but there is no
sequence_chunk` ancestor type –
GFF3MissingSequenceNameError – If there are no sequence names associated with this transcript.

to_bed12(score: Optional[int] = 0, rgb: Optional[inscripta.biocantor.io.bed.RGB] = RGB(0, 0, 0), name: Optional[str] = 'transcript_symbol', chromosome_relative_coordinates: bool = True) → inscripta.biocantor.io.bed.BED12

Write a BED12 format representation of this TranscriptInterval.

Both of these optional arguments are specific to the BED12 format.

Parameters

score – An optional score associated with a interval. UCSC requires an integer between 0 and 1000.
rgb – An optional RGB string for visualization on a browser. This allows you to have multiple colors on a single UCSC track.
name – Which identifier in this record to use as ‘name’. feature_name to guid. If the supplied string is not a valid attribute, it is used directly.
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.

Returns

A BED12 object.

Raises

NoSuchAncestorException – If chromosome_relative_coordinates is False but there is no
sequence_chunk` ancestor type –

incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) → TranscriptInterval: Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new TranscriptInterval with those changes incorporated.

biocantor.gene.transcript

Module Contents

Classes

`biocantor.gene.transcript`