biocantor.gene.feature

Object representation of features. Includes an abstract feature class that is also used by transcripts.

Each object is capable of exporting itself to BED and GFF3.

Module Contents

Classes

FeatureInterval

FeatureIntervals are generic intervals. These can be used to model genome promoters,

FeatureIntervalCollection

A FeatureIntervalCollection is arbitrary container of intervals.

class biocantor.gene.feature.FeatureInterval(interval_starts: List[int], interval_ends: List[int], strand: inscripta.biocantor.location.strand.Strand, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, feature_types: Optional[List[str]] = None, feature_name: Optional[str] = None, feature_id: Optional[str] = None, guid: Optional[uuid.UUID] = None, feature_guid: Optional[uuid.UUID] = None, is_primary_feature: Optional[bool] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)

Bases: inscripta.biocantor.gene.interval.AbstractFeatureInterval

FeatureIntervals are generic intervals. These can be used to model genome promoters, open chromatin sites, etc.

property id: str

Returns the ID of this feature. Provides a shared API across genes/transcripts and features.

property name: str

Returns the name of this feature. Provides a shared API across genes/transcripts and features.

property cds_start: int
property cds_end: int
property chunk_relative_cds_start: int
property chunk_relative_cds_end: int
property cds_location: inscripta.biocantor.location.location.Location

Returns the Location of the CDS in chromosome coordinates

property cds_chunk_relative_location: inscripta.biocantor.location.location.Location

Returns the Location of the CDS in chunk relative coordinates

property is_coding: bool
property has_in_frame_stop: bool
property cds_size: int

CDS size, regardless of chunk relativity (does not shrink)

property chunk_relative_cds_size: int

Chunk relative CDS size (can shrink if the Location is a slice of the full transcript)

interval_type
_identifiers = ['feature_name', 'feature_id']
__str__()

Return str(self).

__repr__()

Return repr(self).

to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]

Convert to a dict usable by biocantor.io.models.FeatureIntervalModel.

static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None) FeatureInterval

Build a FeatureInterval from a dictionary.

static from_location(location: inscripta.biocantor.location.location.Location, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, guid: Optional[uuid.UUID] = None, feature_guid: Optional[uuid.UUID] = None, feature_types: Optional[List[str]] = None, feature_id: Optional[str] = None, feature_name: Optional[str] = None, is_primary_feature: Optional[bool] = None) FeatureInterval
static from_chunk_relative_location(location: inscripta.biocantor.location.location.Location, qualifiers: Optional[Dict[Hashable, inscripta.biocantor.gene.interval.QualifierValue]] = None, sequence_guid: Optional[uuid.UUID] = None, sequence_name: Optional[str] = None, guid: Optional[uuid.UUID] = None, feature_guid: Optional[uuid.UUID] = None, feature_types: Optional[List[str]] = None, feature_id: Optional[str] = None, feature_name: Optional[str] = None, is_primary_feature: Optional[bool] = None) FeatureInterval

Allows construction of a FeatureInterval from a chunk-relative location. This is a location present on a sequence chunk, which should be built by the convenience function seq_chunk_to_parent:

from inscripta.biocantor.io.parser import seq_chunk_to_parent
parent = seq_chunk_to_parent('AANAAATGGCGAGCACCTAACCCCCNCC', "NC_000913.3", 222213, 222241)
loc = SingleInterval(5, 20, Strand.PLUS, parent=parent)

And then, this can be lifted back to chromosomal coordinates like such:

loc.lift_over_to_first_ancestor_of_type("chromosome")
intersect(location: inscripta.biocantor.location.location.Location, new_guid: Optional[uuid.UUID] = None, new_qualifiers: Optional[dict] = None) FeatureInterval

Returns a new FeatureInterval representing the intersection of this FeatureInterval’s location with the other location.

Strand of the other location is ignored; returned FeatureInterval is on the same strand as this FeatureInterval.

export_qualifiers(parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None) Dict[Hashable, Set[str]]

Exports qualifiers for GFF3/GenBank export

to_gff(parent: Optional[str] = None, parent_qualifiers: Optional[Dict[Hashable, Set[str]]] = None, chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]

Writes a GFF format list of lists for this feature.

The additional qualifiers are used when writing a hierarchical relationship back to files. GFF files are easier to work with if the children features have the qualifiers of their parents.

Parameters
  • parent – ID of the Parent of this transcript.

  • parent_qualifiers – Directly pull qualifiers in from this dictionary.

  • chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.

  • raise_on_reserved_attributes – If True, then GFF3 reserved attributes such as ID and Name present in the qualifiers will lead to an exception and not a warning.

Yields

GFFRow

Raises
to_bed12(score: Optional[int] = 0, rgb: Optional[inscripta.biocantor.io.bed.RGB] = RGB(0, 0, 0), name: Optional[str] = 'feature_name', chromosome_relative_coordinates: bool = True) inscripta.biocantor.io.bed.BED12

Write a BED12 format representation of this FeatureInterval.

Both of these optional arguments are specific to the BED12 format.

Parameters
  • score – An optional score associated with a interval. UCSC requires an integer between 0 and 1000.

  • rgb – An optional RGB string for visualization on a browser. This allows you to have multiple colors on a single UCSC track.

  • name – Which identifier in this record to use as ‘name’. feature_name to guid. If the supplied string is not a valid attribute, it is used directly.

  • chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.

Returns

A BED12 object.

Raises
  • NoSuchAncestorException – If chromosome_relative_coordinates is False but there is no

  • sequence_chunk` ancestor type

incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) FeatureInterval

Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new FeatureInterval with those changes incorporated.

class biocantor.gene.feature.FeatureIntervalCollection(feature_intervals: List[FeatureInterval], feature_collection_name: Optional[str] = None, feature_collection_id: Optional[str] = None, feature_collection_type: Optional[str] = None, locus_tag: Optional[str] = None, sequence_name: Optional[str] = None, sequence_guid: Optional[uuid.UUID] = None, guid: Optional[uuid.UUID] = None, qualifiers: Optional[Dict[Hashable, List[inscripta.biocantor.gene.interval.QualifierValue]]] = None, parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None)

Bases: inscripta.biocantor.gene.interval.AbstractFeatureIntervalCollection

A FeatureIntervalCollection is arbitrary container of intervals.

This can be thought of to be analogous to a GeneInterval, but for non-transcribed features that are grouped in some fashion. An example is transcription factor binding sites for a specific transcription factor.

The Strand of this feature interval collection is always the plus strand.

This cannot be empty; it must have at least one feature interval.

property is_coding: bool

Never coding.

property children_guids: Set[uuid.UUID]

Get all of the GUIDs for children.

Returns: A set of UUIDs

property id: str

Returns the ID of this feature collection. Provides a shared API across genes/transcripts and features.

property name: str

Returns the name of this feature collection. Provides a shared API across genes/transcripts and features.

interval_type
_identifiers = ['feature_collection_id', 'feature_collection_name', 'locus_tag']
__repr__()

Return repr(self).

iter_children() Iterable[FeatureInterval]

Iterate over all intervals in this collection.

get_primary_feature() Union[FeatureInterval, None]

Get the primary feature, if it exists.

get_primary_feature_sequence() Union[inscripta.biocantor.sequence.Sequence, None]

Convenience function that provides shared API between features and transcripts

get_merged_feature() FeatureInterval

Generate a single FeatureInterval that merges all intervals together.

to_dict(chromosome_relative_coordinates: bool = True) Dict[str, Any]

Convert to a dict usable by FeatureIntervalCollectionModel.

static from_dict(vals: Dict[str, Any], parent_or_seq_chunk_parent: Optional[inscripta.biocantor.parent.parent.Parent] = None) FeatureIntervalCollection

Build a FeatureIntervalCollection from a dictionary representation

export_qualifiers() Dict[Hashable, Set[str]]

Exports qualifiers for GFF3/GenBank export

query_by_guids(id_or_ids: Union[uuid.UUID, List[uuid.UUID]]) Optional[FeatureIntervalCollection]

Filter this feature collection object by a list of unique IDs.

Parameters

id_or_ids – List of GUIDs, or unique IDs. Can also be a single ID.

Returns

FeatureIntervalCollection, or None if there are no matching GUIDs.

to_gff(chromosome_relative_coordinates: bool = True, raise_on_reserved_attributes: Optional[bool] = True) Iterator[inscripta.biocantor.io.gff3.rows.GFFRow]

Produces iterable of GFFRow for this feature collection and its children.

Parameters
  • chromosome_relative_coordinates – Output GFF in chromosome-relative coordinates? Will raise an exception if there is not a sequence_chunk ancestor type.

  • raise_on_reserved_attributes – If True, then GFF3 reserved attributes such as ID and Name present in the qualifiers will lead to an exception and not a warning.

Yields

GFFRow

Raises
  • NoSuchAncestorException – If chromosome_relative_coordinates is False but there is no

  • sequence_chunk` ancestor type

incorporate_variants(variants: Union[inscripta.biocantor.gene.variants.VariantInterval, inscripta.biocantor.gene.variants.VariantIntervalCollection]) FeatureIntervalCollection

Incorporate all of the variant(s) for an input VariantInterval or VariantIntervalCollection, producing a new FeatureIntervalCollection with those changes incorporated on every child.