biocantor.gene.transcript
Object representation of Transcripts.
Each object is capable of exporting itself to BED and GFF3.
Module Contents
Classes
Transcripts are a collection of Intervals. The canonical transcript has a 5' UTR, |
- class biocantor.gene.transcript.TranscriptInterval(exon_starts: List[int], exon_ends: List[int], strand: inscripta.biocantor.location.strand.Strand, cds_starts: Optional[List[int]] = None, cds_ends: Optional[List[int]] = None, cds_frames: Optional[List[inscripta.biocantor.gene.cds_frame.CDSFrame]] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)
Bases:
inscripta.biocantor.gene.interval.AbstractFeatureInterval
Transcripts are a collection of Intervals. The canonical transcript has a 5’ UTR, a CDS, and a 3’ UTR. However, a transcript may be lacking any of those three features – it could be noncoding (no CDS), truncated (no 3’ UTR or 5’ UTR). It could also be a single exon.
This object represents all of these possibilities through one Location that represents the Exons, and an optional CDS that represents the
CDSInterval
.In addition to the interval information, this object has optional metadata and an optional understanding of sequence.
- property cds_location: inscripta.biocantor.location.location.Location
Returns the Location of the CDS in chromosome coordinates
- property cds_chunk_relative_location: inscripta.biocantor.location.location.Location
Returns the Location of the CDS in chunk relative coordinates
- property chromosome_intron_location
Returns the Location of the Introns of this Transcript in chromosome coordinates
- property chunk_relative_intron_location
Returns the Location of the Introns of this Transcript in chunk relative coordinates
- property chunk_relative_cds_size: int
Chunk relative CDS size (can shrink if the Location is a slice of the full transcript)
- property cds_blocks: Iterable[inscripta.biocantor.location.location_impl.SingleInterval]
Wrapper for blocks function that reports blocks in chromosome coordinates
- property chunk_relative_cds_blocks: List[inscripta.biocantor.location.location.Location]
Wrapper for blocks function that reports blocks in chunk-relative coordinates
- property id: str
Returns the ID of this transcript. Provides a shared API across genes/transcripts and features.
- property name: str
Returns the name of this transcript. Provides a shared API across genes/transcripts and features.
- interval_type
- _identifiers = ['transcript_id', 'transcript_symbol', 'protein_id']
- __str__()
Return str(self).
- __repr__()
Return repr(self).
- __len__()
- _reset_parent(parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)
Convenience function that wraps location.reset_parent(). Overrides the parent function in
AbstractInterval
in order to also update the CDS interval.
- _liftover_this_location_to_seq_chunk_parent(seq_chunk_parent: inscripta.biocantor.parent.parent.Parent)
Lift this transcript interval to a new subset. Overrides the parent method in order to perform this task on the CDS interval as well.
This could happen as the result of a subsetting operation.
This will introduce chunk-relative coordinates to this interval, or reduce the size of existing chunk-relative coordinates.
This function calls the parent static method
AbstractInterval.liftover_location_to_seq_chunk_parent()
, but differs in two key ways:It acts on an instantiated subclass of this abstract class, modifying the location.
It handles the case where a subclass is already a slice, by first lifting up to genomic coordinates.
For these reasons, and particularly #1, this is a private method that is intended to be used during construction of a subclass. Modifying the locations in-place are generally a bad idea after initial construction of a interval class.
- to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]
Convert to a dict usable by
biocantor.io.models.TranscriptIntervalModel
.
- static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None) TranscriptInterval
Build a
TranscriptInterval
from a dictionary.
- static from_location(location: inscripta.biocantor.location.location.Location, cds: Optional[inscripta.biocantor.gene.cds.CDSInterval] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None) TranscriptInterval
- static from_chunk_relative_location(location: inscripta.biocantor.location.location.Location, cds: Optional[inscripta.biocantor.gene.cds.CDSInterval] = None, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, is_primary_tx: Optional[bool] = None, transcript_id: Optional[str] = None, transcript_symbol: Optional[str] = None, transcript_type: Optional[inscripta.biocantor.gene.biotype.Biotype] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, protein_id: Optional[str] = None, product: Optional[str] = None, guid: Optional[uuid.UUID] = None, transcript_guid: Optional[uuid.UUID] = None) TranscriptInterval
Allows construction of a TranscriptInterval from a chunk-relative location. This is a location present on a sequence chunk, which should be built by the convenience function seq_chunk_to_parent:
from inscripta.biocantor.io.parser import seq_chunk_to_parent parent = seq_chunk_to_parent('AANAAATGGCGAGCACCTAACCCCCNCC', "NC_000913.3", 222213, 222241) loc = SingleInterval(5, 20, Strand.PLUS, parent=parent)
And then, this can be lifted back to chromosomal coordinates like such:
loc.lift_over_to_first_ancestor_of_type("chromosome")
- intersect(location: inscripta.biocantor.location.location.Location, new_guid: Optional[uuid.UUID] = None, new_qualifiers: Optional[dict] = None) TranscriptInterval
Returns a new TranscriptInterval representing the intersection of this Transcript’s location with the other location.
Strand of the other location is ignored; returned Transcript is on the same strand as this Transcript.
If this Transcript has a CDS, it will be dropped because CDS intersections are not currently supported.
- sequence_pos_to_transcript(pos: int) int
Converts sequence position to relative position along this transcript.
- chunk_relative_pos_to_transcript(pos: int) int
Converts chunk-relative sequence position to relative position along this transcript.
- sequence_interval_to_transcript(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location
Converts a contiguous interval on the sequence to a relative location within this transcript.
- chunk_relative_interval_to_transcript(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location
Converts a contiguous interval on the chunk-relative sequence to a relative location within this transcript.
- transcript_pos_to_sequence(pos: int) int
Converts a relative position along this transcript to sequence coordinate.
- transcript_pos_to_chunk_relative(pos: int) int
Converts a relative position along this transcript to chunk-relative sequence coordinate.
- transcript_interval_to_sequence(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location
Converts a contiguous interval relative to this transcript to a spliced location on the sequence.
- transcript_interval_to_chunk_relative(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location
Converts a contiguous interval relative to this transcript to a spliced location on the chunk-relative sequence.
- cds_pos_to_sequence(pos: int) int
Converts a relative position along the CDS to sequence coordinate.
- cds_pos_to_chunk_relative(pos: int) int
Converts a relative position along the CDS to chunk-relative sequence coordinate.
- cds_interval_to_sequence(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location
Converts a contiguous interval relative to the CDS to a spliced location on the sequence.
- cds_interval_to_chunk_relative(rel_start: int, rel_end: int, rel_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location
Converts a contiguous interval relative to the CDS to a spliced location on the chunk-relative sequence.
- chunk_relative_pos_to_cds(pos: int) int
Converts chunk-relative sequence position to relative position along the CDS.
- sequence_interval_to_cds(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location
Converts a contiguous interval on the sequence to a relative location within the CDS.
- chunk_relative_interval_to_cds(chr_start: int, chr_end: int, chr_strand: inscripta.biocantor.location.strand.Strand) inscripta.biocantor.location.location.Location
Converts a contiguous interval on the chunk-relative sequence to a relative location within the CDS.
- cds_pos_to_transcript(pos: int) int
Converts a relative position along the CDS to a relative position along this transcript.
- transcript_pos_to_cds(pos: int) int
Converts a relative position along this transcript to a relative position along the CDS.
- get_5p_interval() inscripta.biocantor.location.location.Location
Return the 5’ UTR as a Location, if it exists.
WARNING: If this is a chunk-relative transcript, the result of this function will also be chunk-relative.
- get_3p_interval() inscripta.biocantor.location.location.Location
Returns the 3’ UTR as a location, if it exists.
WARNING: If this is a chunk-relative transcript, the result of this function will also be chunk-relative.
- get_transcript_sequence() inscripta.biocantor.sequence.sequence.Sequence
Returns the mRNA sequence.
- get_cds_sequence() inscripta.biocantor.sequence.sequence.Sequence
Returns the in-frame CDS sequence (always multiple of 3).
- get_protein_sequence(truncate_at_in_frame_stop: Optional[bool] = False, translation_table: Optional[inscripta.biocantor.gene.codon.TranslationTable] = TranslationTable.DEFAULT) inscripta.biocantor.sequence.sequence.Sequence
Return the translation of this transcript, if possible.
- export_qualifiers(parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None) Dict[Hashable, Set[Hashable]]
Exports qualifiers for GFF3/GenBank export
- to_gff(parent: Optional[str] = None, parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None, chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]
Writes a GFF format list of lists for this transcript.
The additional qualifiers are used when writing a hierarchical relationship back to files. GFF files are easier to work with if the children features have the qualifiers of their parents.
- Parameters
parent – ID of the Parent of this transcript.
parent_qualifiers – Directly pull qualifiers in from this dictionary.
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.raise_on_reserved_attributes – If
True
, then GFF3 reserved attributes such asID
andName
present in the qualifiers will lead to an exception and not a warning.
- Yields
- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
GFF3MissingSequenceNameError – If there are no sequence names associated with this transcript.
- to_bed12(score: Optional[int] = 0, rgb: Optional[inscripta.biocantor.io.bed.RGB] = RGB(0, 0, 0), name: Optional[str] = 'transcript_symbol', chromosome_relative_coordinates: bool = True) inscripta.biocantor.io.bed.BED12
Write a BED12 format representation of this
TranscriptInterval
.Both of these optional arguments are specific to the BED12 format.
- Parameters
score – An optional score associated with a interval. UCSC requires an integer between 0 and 1000.
rgb – An optional RGB string for visualization on a browser. This allows you to have multiple colors on a single UCSC track.
name – Which identifier in this record to use as ‘name’. feature_name to guid. If the supplied string is not a valid attribute, it is used directly.
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.
- Returns
A
BED12
object.- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
- incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) TranscriptInterval
Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new TranscriptInterval with those changes incorporated.