biocantor.gene.transcript

Object representation of Transcripts.

Each object is capable of exporting itself to BED and GFF3.

Module Contents

Classes

TranscriptInterval

Transcripts are a collection of Intervals. The canonical transcript has a 5' UTR,

class biocantor.gene.transcript.TranscriptInterval(exon_starts: List[int], exon_ends: List[int], strand: inscripta.biocantor.location.strand.Strand, cds_starts: Optional[List[int]] = None, cds_ends: Optional[List[int]] = None, cds_frames: Optional[List[inscripta.biocantor.gene.cds_frame.CDSFrame]] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)

Bases: inscripta.biocantor.gene.interval.AbstractFeatureInterval

Transcripts are a collection of Intervals. The canonical transcript has a 5’ UTR, a CDS, and a 3’ UTR. However, a transcript may be lacking any of those three features – it could be noncoding (no CDS), truncated (no 3’ UTR or 5’ UTR). It could also be a single exon.

This object represents all of these possibilities through one Location that represents the Exons, and an optional CDS that represents the CDSInterval.

In addition to the interval information, this object has optional metadata and an optional understanding of sequence.

property is_primary_tx: bool

Is this the primary transcript?

property cds_location: inscripta.biocantor.location.location.Location

Returns the Location of the CDS in chromosome coordinates

property cds_chunk_relative_location: inscripta.biocantor.location.location.Location

Returns the Location of the CDS in chunk relative coordinates

property chromosome_intron_location

Returns the Location of the Introns of this Transcript in chromosome coordinates

property chunk_relative_intron_location

Returns the Location of the Introns of this Transcript in chunk relative coordinates

property is_coding: bool
property has_in_frame_stop: bool
property cds_size: int

CDS size, regardless of chunk relativity (does not shrink)

property chunk_relative_cds_size: int

Chunk relative CDS size (can shrink if the Location is a slice of the full transcript)

property cds_start: int
property cds_end: int
property chunk_relative_cds_start: int
property chunk_relative_cds_end: int
property cds_blocks: Iterable[inscripta.biocantor.location.location_impl.SingleInterval]

Wrapper for blocks function that reports blocks in chromosome coordinates

property chunk_relative_cds_blocks: List[inscripta.biocantor.location.location.Location]

Wrapper for blocks function that reports blocks in chunk-relative coordinates

property id: str

Returns the ID of this transcript. Provides a shared API across genes/transcripts and features.

property name: str

Returns the name of this transcript. Provides a shared API across genes/transcripts and features.

interval_type
_identifiers = ['transcript_id', 'transcript_symbol', 'protein_id']
__str__()

Return str(self).

__repr__()

Return repr(self).

__len__()
_reset_parent(parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)

Convenience function that wraps location.reset_parent(). Overrides the parent function in AbstractInterval in order to also update the CDS interval.

_liftover_this_location_to_seq_chunk_parent(seq_chunk_parent: inscripta.biocantor.parent.parent.Parent)

Lift this transcript interval to a new subset. Overrides the parent method in order to perform this task on the CDS interval as well.

This could happen as the result of a subsetting operation.

This will introduce chunk-relative coordinates to this interval, or reduce the size of existing chunk-relative coordinates.

This function calls the parent static method AbstractInterval.liftover_location_to_seq_chunk_parent(), but differs in two key ways:

  1. It acts on an instantiated subclass of this abstract class, modifying the location.

  2. It handles the case where a subclass is already a slice, by first lifting up to genomic coordinates.

For these reasons, and particularly #1, this is a private method that is intended to be used during construction of a subclass. Modifying the locations in-place are generally a bad idea after initial construction of a interval class.

to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]

Convert to a dict usable by biocantor.io.models.TranscriptIntervalModel.

static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None) TranscriptInterval

Build a TranscriptInterval from a dictionary.

static from_location(location: inscripta.biocantor.location.location.Location, cds: Optional[inscripta.biocantor.gene.cds.CDSInterval] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None) TranscriptInterval
static from_chunk_relative_location(location: inscripta.biocantor.location.location.Location, cds: Optional[inscripta.biocantor.gene.cds.CDSInterval] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None) TranscriptInterval

Allows construction of a TranscriptInterval from a chunk-relative location. This is a location present on a sequence chunk, which should be built by the convenience function seq_chunk_to_parent:

from inscripta.biocantor.io.parser import seq_chunk_to_parent
parent = seq_chunk_to_parent('AANAAATGGCGAGCACCTAACCCCCNCC', "NC_000913.3", 222213, 222241)
loc = SingleInterval(5, 20, Strand.PLUS, parent=parent)

And then, this can be lifted back to chromosomal coordinates like such:

loc.lift_over_to_first_ancestor_of_type("chromosome")
intersect(location: inscripta.biocantor.location.location.Location, new_guid: Optional[uuid.UUID] = None, new_qualifiers: Optional[dict] = None) TranscriptInterval

Returns a new TranscriptInterval representing the intersection of this Transcript’s location with the other location.

Strand of the other location is ignored; returned Transcript is on the same strand as this Transcript.

If this Transcript has a CDS, it will be dropped because CDS intersections are not currently supported.

sequence_pos_to_transcript(pos: int) int

Converts sequence position to relative position along this transcript.

chunk_relative_pos_to_transcript(pos: int) int

Converts chunk-relative sequence position to relative position along this transcript.

sequence_interval_to_transcript(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location

Converts a contiguous interval on the sequence to a relative location within this transcript.

chunk_relative_interval_to_transcript(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location

Converts a contiguous interval on the chunk-relative sequence to a relative location within this transcript.

transcript_pos_to_sequence(pos: int) int

Converts a relative position along this transcript to sequence coordinate.

transcript_pos_to_chunk_relative(pos: int) int

Converts a relative position along this transcript to chunk-relative sequence coordinate.

transcript_interval_to_sequence(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location

Converts a contiguous interval relative to this transcript to a spliced location on the sequence.

transcript_interval_to_chunk_relative(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location

Converts a contiguous interval relative to this transcript to a spliced location on the chunk-relative sequence.

cds_pos_to_sequence(pos: int) int

Converts a relative position along the CDS to sequence coordinate.

cds_pos_to_chunk_relative(pos: int) int

Converts a relative position along the CDS to chunk-relative sequence coordinate.

cds_interval_to_sequence(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location

Converts a contiguous interval relative to the CDS to a spliced location on the sequence.

cds_interval_to_chunk_relative(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location

Converts a contiguous interval relative to the CDS to a spliced location on the chunk-relative sequence.

sequence_pos_to_cds(pos: int) int

Converts sequence position to relative position along the CDS.

chunk_relative_pos_to_cds(pos: int) int

Converts chunk-relative sequence position to relative position along the CDS.

sequence_interval_to_cds(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location

Converts a contiguous interval on the sequence to a relative location within the CDS.

chunk_relative_interval_to_cds(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location

Converts a contiguous interval on the chunk-relative sequence to a relative location within the CDS.

cds_pos_to_transcript(pos: int) int

Converts a relative position along the CDS to a relative position along this transcript.

transcript_pos_to_cds(pos: int) int

Converts a relative position along this transcript to a relative position along the CDS.

get_5p_interval() inscripta.biocantor.location.location.Location

Return the 5’ UTR as a Location, if it exists.

WARNING: If this is a chunk-relative transcript, the result of this function will also be chunk-relative.

get_3p_interval() inscripta.biocantor.location.location.Location

Returns the 3’ UTR as a location, if it exists.

WARNING: If this is a chunk-relative transcript, the result of this function will also be chunk-relative.

get_transcript_sequence() inscripta.biocantor.sequence.sequence.Sequence

Returns the mRNA sequence.

get_cds_sequence() inscripta.biocantor.sequence.sequence.Sequence

Returns the in-frame CDS sequence (always multiple of 3).

get_protein_sequence(truncate_at_in_frame_stop: Optional[bool] = False, translation_table: Optional[inscripta.biocantor.gene.codon.TranslationTable] = TranslationTable.DEFAULT) inscripta.biocantor.sequence.sequence.Sequence

Return the translation of this transcript, if possible.

export_qualifiers(parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None) Dict[Hashable, Set[Hashable]]

Exports qualifiers for GFF3/GenBank export

to_gff(parent: Optional[str] = None, parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None, chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]

Writes a GFF format list of lists for this transcript.

The additional qualifiers are used when writing a hierarchical relationship back to files. GFF files are easier to work with if the children features have the qualifiers of their parents.

Parameters
  • parent – ID of the Parent of this transcript.

  • parent_qualifiers – Directly pull qualifiers in from this dictionary.

  • chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.

  • raise_on_reserved_attributes – If True, then GFF3 reserved attributes such as ID and Name present in the qualifiers will lead to an exception and not a warning.

Yields

GFFRow

Raises
to_bed12(score: Optional[int] = 0, rgb: Optional[inscripta.biocantor.io.bed.RGB] = RGB(0, 0, 0), name: Optional[str] = 'transcript_symbol', chromosome_relative_coordinates: bool = True) inscripta.biocantor.io.bed.BED12

Write a BED12 format representation of this TranscriptInterval.

Both of these optional arguments are specific to the BED12 format.

Parameters
  • score – An optional score associated with a interval. UCSC requires an integer between 0 and 1000.

  • rgb – An optional RGB string for visualization on a browser. This allows you to have multiple colors on a single UCSC track.

  • name – Which identifier in this record to use as ‘name’. feature_name to guid. If the supplied string is not a valid attribute, it is used directly.

  • chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.

Returns

A BED12 object.

Raises
  • NoSuchAncestorException – If chromosome_relative_coordinates is False but there is no

  • sequence_chunk` ancestor type

incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) TranscriptInterval

Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new TranscriptInterval with those changes incorporated.