biocantor.gene.feature
Object representation of features. Includes an abstract feature class that is also used by transcripts.
Each object is capable of exporting itself to BED and GFF3.
Module Contents
Classes
FeatureIntervals are generic intervals. These can be used to model genome promoters, |
|
A FeatureIntervalCollection is arbitrary container of intervals. |
- class biocantor.gene.feature.FeatureInterval(interval_starts: List[int], interval_ends: List[int], strand: inscripta.biocantor.location.strand.Strand, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, feature_types: Optional[List[str]] = None, feature_name: Optional[str] = None, feature_id: Optional[str] = None, guid: Optional[uuid.UUID] = None, feature_guid: Optional[uuid.UUID] = None, is_primary_feature: Optional[bool] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)
Bases:
inscripta.biocantor.gene.interval.AbstractFeatureInterval
FeatureIntervals are generic intervals. These can be used to model genome promoters, open chromatin sites, etc.
- property id: str
Returns the ID of this feature. Provides a shared API across genes/transcripts and features.
- property name: str
Returns the name of this feature. Provides a shared API across genes/transcripts and features.
- property cds_location: inscripta.biocantor.location.location.Location
Returns the Location of the CDS in chromosome coordinates
- property cds_chunk_relative_location: inscripta.biocantor.location.location.Location
Returns the Location of the CDS in chunk relative coordinates
- property chunk_relative_cds_size: int
Chunk relative CDS size (can shrink if the Location is a slice of the full transcript)
- interval_type
- _identifiers = ['feature_name', 'feature_id']
- __str__()
Return str(self).
- __repr__()
Return repr(self).
- to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]
Convert to a dict usable by
biocantor.io.models.FeatureIntervalModel
.
- static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None) FeatureInterval
Build a
FeatureInterval
from a dictionary.
- static from_location(location: inscripta.biocantor.location.location.Location, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, guid: Optional[uuid.UUID] = None, feature_guid: Optional[uuid.UUID] = None, feature_types: Optional[List[str]] = None, feature_id: Optional[str] = None, feature_name: Optional[str] = None, is_primary_feature: Optional[bool] = None) FeatureInterval
- static from_chunk_relative_location(location: inscripta.biocantor.location.location.Location, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, guid: Optional[uuid.UUID] = None, feature_guid: Optional[uuid.UUID] = None, feature_types: Optional[List[str]] = None, feature_id: Optional[str] = None, feature_name: Optional[str] = None, is_primary_feature: Optional[bool] = None) FeatureInterval
Allows construction of a FeatureInterval from a chunk-relative location. This is a location present on a sequence chunk, which should be built by the convenience function seq_chunk_to_parent:
from inscripta.biocantor.io.parser import seq_chunk_to_parent parent = seq_chunk_to_parent('AANAAATGGCGAGCACCTAACCCCCNCC', "NC_000913.3", 222213, 222241) loc = SingleInterval(5, 20, Strand.PLUS, parent=parent)
And then, this can be lifted back to chromosomal coordinates like such:
loc.lift_over_to_first_ancestor_of_type("chromosome")
- intersect(location: inscripta.biocantor.location.location.Location, new_guid: Optional[uuid.UUID] = None, new_qualifiers: Optional[dict] = None) FeatureInterval
Returns a new FeatureInterval representing the intersection of this FeatureInterval’s location with the other location.
Strand of the other location is ignored; returned FeatureInterval is on the same strand as this FeatureInterval.
- export_qualifiers(parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None) Dict[Hashable, Set[str]]
Exports qualifiers for GFF3/GenBank export
- to_gff(parent: Optional[str] = None, parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None, chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]
Writes a GFF format list of lists for this feature.
The additional qualifiers are used when writing a hierarchical relationship back to files. GFF files are easier to work with if the children features have the qualifiers of their parents.
- Parameters
parent – ID of the Parent of this transcript.
parent_qualifiers – Directly pull qualifiers in from this dictionary.
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.raise_on_reserved_attributes – If
True
, then GFF3 reserved attributes such asID
andName
present in the qualifiers will lead to an exception and not a warning.
- Yields
- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
GFF3MissingSequenceNameError – If there are no sequence names associated with this feature.
- to_bed12(score: Optional[int] = 0, rgb: Optional[inscripta.biocantor.io.bed.RGB] = RGB(0, 0, 0), name: Optional[str] = 'feature_name', chromosome_relative_coordinates: bool = True) inscripta.biocantor.io.bed.BED12
Write a BED12 format representation of this
FeatureInterval
.Both of these optional arguments are specific to the BED12 format.
- Parameters
score – An optional score associated with a interval. UCSC requires an integer between 0 and 1000.
rgb – An optional RGB string for visualization on a browser. This allows you to have multiple colors on a single UCSC track.
name – Which identifier in this record to use as ‘name’. feature_name to guid. If the supplied string is not a valid attribute, it is used directly.
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.
- Returns
A
BED12
object.- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
- incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) FeatureInterval
Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new FeatureInterval with those changes incorporated.
- class biocantor.gene.feature.FeatureIntervalCollection(feature_intervals: List[FeatureInterval], feature_collection_name: Optional[str] = None, feature_collection_id: Optional[str] = None, feature_collection_type: Optional[str] = None, locus_tag: Optional[str] = None, sequence_name: Optional[str] = None, sequence_guid: Optional[uuid.UUID] = None, guid: Optional[uuid.UUID] = None, qualifiers: Optional[Dict[Hashable, List[inscripta.biocantor.gene.interval.QualifierValue]]] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)
Bases:
inscripta.biocantor.gene.interval.AbstractFeatureIntervalCollection
A FeatureIntervalCollection is arbitrary container of intervals.
This can be thought of to be analogous to a
GeneInterval
, but for non-transcribed features that are grouped in some fashion. An example is transcription factor binding sites for a specific transcription factor.The
Strand
of this feature interval collection is always the plus strand.This cannot be empty; it must have at least one feature interval.
- property id: str
Returns the ID of this feature collection. Provides a shared API across genes/transcripts and features.
- property name: str
Returns the name of this feature collection. Provides a shared API across genes/transcripts and features.
- interval_type
- _identifiers = ['feature_collection_id', 'feature_collection_name', 'locus_tag']
- __repr__()
Return repr(self).
- iter_children() Iterable[FeatureInterval]
Iterate over all intervals in this collection.
- get_primary_feature() Union[FeatureInterval, None]
Get the primary feature, if it exists.
- get_primary_feature_sequence() Union[inscripta.biocantor.sequence.Sequence, None]
Convenience function that provides shared API between features and transcripts
- get_merged_feature() FeatureInterval
Generate a single
FeatureInterval
that merges all intervals together.
- to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]
Convert to a dict usable by
FeatureIntervalCollectionModel
.
- static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None) FeatureIntervalCollection
Build a
FeatureIntervalCollection
from a dictionary representation
- query_by_guids(id_or_ids: Union[uuid.UUID, List[uuid.UUID]]) Optional[FeatureIntervalCollection]
Filter this feature collection object by a list of unique IDs.
- Parameters
id_or_ids – List of GUIDs, or unique IDs. Can also be a single ID.
- Returns
FeatureIntervalCollection
, or None if there are no matching GUIDs.
- to_gff(chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]
Produces iterable of
GFFRow
for this feature collection and its children.- Parameters
chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a
sequence_chunk
ancestor type.raise_on_reserved_attributes – If
True
, then GFF3 reserved attributes such asID
andName
present in the qualifiers will lead to an exception and not a warning.
- Yields
- Raises
NoSuchAncestorException – If
chromosome_relative_coordinates
isFalse
but there is nosequence_chunk` ancestor type –
- incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) FeatureIntervalCollection
Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new FeatureIntervalCollection with those changes incorporated on every child.