biocantor

Subpackages

Submodules

Package Contents

Classes

SequenceType

str(object='') -> str

DistanceType

Generic enumeration.

AbstractLocation

Shared AbstractLocation base class simplifies imports for type checking

AbstractSequence

Shared AbstractSequence base class simplifies imports for type checking

AbstractParent

Shared AbstractParent base class simplifies imports for type checking

Attributes

__version__

Strand

Alphabet

biocantor.__version__ = 0.19.0
biocantor.Strand
biocantor.Alphabet
class biocantor.SequenceType

Bases: str, enum.Enum

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to sys.getdefaultencoding(). errors defaults to ‘strict’.

CHROMOSOME = chromosome
SEQUENCE_CHUNK = sequence_chunk
static sequence_type_str_to_type(sequence_type: Optional[str]) Optional[Union[SequenceType, str]]

Convenience function to convert a str to a SequenceType, if possible

class biocantor.DistanceType

Bases: enum.Enum

Generic enumeration.

Derive from this class to define new enumerations.

INNER = inner
OUTER = outer
STARTS = starts
ENDS = ends
class biocantor.AbstractLocation

Bases: abc.ABC

Shared AbstractLocation base class simplifies imports for type checking

property parent_id: str

Returns the parent ID

property parent_type: str

Returns the sequence type of the parent

abstract property is_contiguous: bool

Returns True iff this Location is fully contiguous within its parent

abstract property is_empty: bool

Returns True iff this Location is empty

abstract property blocks: List[AbstractLocation]

Returns list of contiguous blocks comprising this Location

abstract property is_overlapping: bool

Returns True if this interval contains overlaps; always False for SingleInterval

abstract property _full_span_interval: AbstractLocation

Returns the full span of this interval; is trivial for a SingleInterval and EmptyLocation

abstract property num_blocks: int

Returns number of contiguous blocks comprising this Location

__slots__ = ['start', 'end', 'strand', 'parent', 'length', '_sequence']
start :int
end :int
strand :Strand
parent :Optional[AbstractParent]
length :int
__len__()

Returns the length (number of positions) of this Location. For subclasses representing discontiguous locations, regions between blocks are not considered.

abstract __str__()

Returns a human readable string representation of this Location

abstract __eq__(other)

Returns True iff this Location is equal to other object

abstract __hash__()

Returns a hash code satisfying location1 == location2 => hash(location1) == hash(location2)

abstract __repr__()

Returns the ‘official’ string representation of this Location

abstract scan_blocks() Iterator[AbstractLocation]

Returns an iterator over blocks in order relative to strand of this Location

abstract optimize_blocks() AbstractLocation

Returns a new Location covering the same positions but with blocks optimized. For example, empty blocks may be removed or adjacent blocks may be combined if applicable.

abstract gap_list() List[AbstractLocation]

Returns list of contiguous regions comprising the space between blocks of this Location. List is ordered relative to strand of this Location.

abstract gaps_location() AbstractLocation

Returns a Location representing the space between blocks of this Location.

abstract extract_sequence() AbstractSequence

Extracts the sequence of this Location from the parent. Concrete implementations should raise ValueError if no parent exists.

abstract parent_to_relative_pos(parent_pos: int) int

Converts a position on the parent to a position relative to this Location. Concrete implementations should raise ValueError if the given position does not overlap this Location.

abstract relative_to_parent_pos(relative_pos: int) int

Converts a position relative to this Location to a position on the parent

abstract parent_to_relative_location(parent_location: AbstractLocation, optimize_blocks: bool = True) AbstractLocation

Converts a Location on the parent to a Location relative to this Location.

Parameters
  • parent_location – Location with the same parent as this Location. Both parents can be None.

  • optimize_blocks – Run optimize_blocks on the resulting location?

Returns

Return type

New Location relative to this Location.

location_relative_to(other: AbstractLocation, optimize_blocks: bool = True) AbstractLocation

Converts this Location to a Location relative to another Location. The Locations must overlap. The returned value represents the relative location of the overlap within the other Location.

If optimize_blocks is True, the resulting Location will not have any adjacent or overlapping intervals. This is often desirable, because the output of this function can have weird coordinates when the locations are overlapping or adjacent. However, there are some cases where it is desirable to retain the original block structure. One such example are CDS where adjacent blocks or overlapping blocks are used to model frameshifts or indels.

abstract _location_relative_to(other: AbstractLocation, optimize_blocks: bool = True) AbstractLocation
abstract relative_interval_to_parent_location(relative_start: int, relative_end: int, relative_strand: Strand) AbstractLocation

Converts an interval relative to this Location to a Location on the parent

Parameters
  • relative_start – 0-based start position of interval relative to this Location

  • relative_end – 0-based exclusive end position of interval relative to this Location

  • relative_strand – Strand of interval relative to the strand of this Location. If the strand of interval is on the SAME strand as the strand of this location, relative_strand is PLUS. If the strand interval is on the OPPOSITE strand, relative_strand is MINUS.

Returns

Return type

New Location on the parent with the parent as parent

abstract scan_windows(window_size: int, step_size: int, start_pos: int = 0) Iterator[AbstractLocation]

Returns an iterator over fixed size windows within this Location. Windows represent sub-regions of this Location and are with respect to the same parent as this Location. The final window returned is the last one that fits completely within this Location. Returned windows are in order according to relative position within this Location; i.e., corresponding to the strand of this Location.

Parameters
  • window_size

  • step_size

  • start_pos – 0-based relative start position of first window relative to this Location

abstract has_overlap(other: AbstractLocation, match_strand: bool = False, full_span: bool = False, strict_parent_compare: bool = False) bool

Returns True iff this Location shares at least one position with the given Location. For subclasses representing discontiguous locations, regions between blocks are not considered.

Parameters
  • other – Other Location

  • match_strand – If set to True, automatically return False if given interval Strand does not match this Location’s Strand

  • full_span – If set to True, compare the full span of this Location to the full span of the other Location.

  • strict_parent_compare – Raise MismatchedParentException if parents do not match

Returns

Return type

True if there is any overlap, False otherwise

abstract reverse() AbstractLocation

Returns a new Location corresponding to this Location with the same start and stop, with strand and structure reversed

abstract reverse_strand() AbstractLocation

Returns a new Location corresponding to this Location with the strand reversed

abstract reset_strand(new_strand: Strand) AbstractLocation

Returns a new Location corresponding to this Location with the given strand

abstract reset_parent(new_parent: Optional[AbstractParent]) AbstractLocation

Returns a new Location corresponding to this Location with positions unchanged and pointing to a new parent

abstract shift_position(shift: int) AbstractLocation

Returns a new Location corresponding to this location shifted by the given distance

abstract distance_to(other: AbstractLocation, distance_type: DistanceType = DistanceType.INNER) int

Returns the distance from this location to another location with the same parent. Return value is a non-negative integer and implementations must be commutative.

Parameters
  • other – Other location with same parent as this location

  • distance_type – Distance type

abstract merge_overlapping() AbstractLocation

Merges overlapping windows

abstract to_biopython() Union[Bio.SeqFeature.FeatureLocation, Bio.SeqFeature.CompoundLocation]

Returns a BioPython interval type; since they do not have a shared base class, we need a union

abstract first_ancestor_of_type(sequence_type: Union[str, SequenceType]) AbstractParent

Returns the Parent object representing the closest ancestor (parent, parent of parent, etc.) of this location which has the given sequence type. Raises NoSuchAncestorException if no ancestor with the given type exists.

abstract has_ancestor_of_type(sequence_type: Union[str, SequenceType]) bool

Returns True if some ancestor (parent, parent of parent, etc.) of of this location has the given sequence type, or False otherwise.

abstract lift_over_to_first_ancestor_of_type(sequence_type: Union[str, SequenceType]) AbstractLocation

Returns a new Location representing the liftover of this Location to its closest ancestor sequence (parent, parent of parent, etc.) which has the given sequence type. If the immediate parent has the given type, returns this Location. Raises NoSuchAncestorException if no ancestor with the given type exists.

has_ancestor_sequence(sequence: AbstractSequence) bool

Returns True iff this Location has some ancestor (parent, parent of parent, etc.) whose sequence attribute is equal to the given sequence

lift_over_to_sequence(sequence: AbstractSequence) AbstractLocation

Returns a new Location representing the liftover of this Location to the given sequence. The given sequence must be equal to the sequence attribute of some Parent in the ancestor hierarchy of this Location; otherwise, raises NoSuchAncestorException.

abstract intersection(other: AbstractLocation, match_strand: bool = True, full_span: bool = False, strict_parent_compare: bool = False) AbstractLocation

Returns a new Location representing the intersection of this Location with the other Location. Returned Location, if nonempty, has the same Strand as this Location. This operation is commutative if match_strand is True.

Parameters
  • other – Other location

  • match_strand – If set to True, automatically return EmptyLocation() if other Location has a different Strand than this Location

  • full_span – If set to True, compare the full span of this Location to the full span of the other Location.

  • strict_parent_compare – Raise MismatchedParentException if parents do not match

abstract union(other: AbstractLocation) AbstractLocation

Returns a new Location representing the union of this Location with the other Location. This operation is commutative. Raises exception if locations cannot be combined.

abstract union_preserve_overlaps(other: AbstractLocation) AbstractLocation

Returns a new Location representing the union of this Location with the other Location, retaining overlapping blocks where applicable. This operation is commutative. Raises exception if locations cannot be combined.

abstract minus(other: AbstractLocation, match_strand: bool = True, strict_parent_compare: bool = False) AbstractLocation

Returns a new Location representing this Location minus its intersection with the other Location. Returned Location has the same Strand as this Location. If there is no intersection, returns this Location. This operation is not commutative.

Parameters
  • other – Other location

  • match_strand – If set to True, automatically return this Location if other Location has a different Strand than this Location

  • strict_parent_compare – Raise MismatchedParentException if parents do not match

abstract extend_absolute(extend_start: int, extend_end: int) AbstractLocation

Returns a new Location representing this Location with start and end positions extended by the given values, ignoring Strand. Returned Location has same Strand as this Location.

Parameters
  • extend_start – Non-negative integer: amount to extend start

  • extend_end – Non-negative integer: amount to extend end

abstract extend_relative(extend_upstream: int, extend_downstream: int) AbstractLocation

Returns a new Location extended upstream and downstream relative to this Location’s Strand.

Parameters
  • extend_upstream – Non-negative integer: amount to extend upstream relative to Strand

  • extend_downstream – Non-negative integer: amount to extend downstream relative to Strand

abstract contains(other: AbstractLocation, match_strand: bool = False, full_span: bool = False, strict_parent_compare: bool = False) bool

Returns True iff this location contains the other. If full_span is True, the full span of both locations are compared.

Parameters
  • other – Other location

  • match_strand – If set to True, automatically return EmptyLocation() if other Location has a different Strand than this Location

  • full_span – If set to True, compare the full span of this Location to the full span of the other Location.

  • strict_parent_compare – Raise MismatchedParentException if parents do not match

class biocantor.AbstractSequence

Bases: abc.ABC

Shared AbstractSequence base class simplifies imports for type checking

__slots__ = ['sequence', 'alphabet', 'id', 'sequence_type', 'parent', '_len']
sequence_type :SequenceType
_len :int
sequence :str
alphabet :Alphabet
parent :Optional[AbstractParent]
id :Optional[str]
__len__()
class biocantor.AbstractParent

Bases: abc.ABC

Shared AbstractParent base class simplifies imports for type checking

abstract property strand: Optional[Strand]

Returns the Strand of this Parent. If this Parent has no explicit Strand, but has a Location, that Location’s Strand is returned.

__slots__ = ['parent', 'id', 'sequence_type', '_strand', 'location', 'sequence', '_strand_property']
id :Optional[str]
sequence_type :Optional[SequenceType]
sequence :Optional[AbstractSequence]
location :Optional[AbstractLocation]
parent :Optional[AbstractParent]
_strand :Optional[Strand]
abstract equals_except_location(other, require_same_sequence: bool = True)

Checks that this Parent is equal to another Parent, ignoring the associated Location members.

By default also checks that any associated Sequence objects also match, but this can be toggled off.

abstract strip_location_info() AbstractParent

Returns a new Parent object representing this Parent with information about child location removed

abstract first_ancestor_of_type(sequence_type: Union[str, SequenceType], include_self: bool = True) AbstractParent

Returns the Parent object representing the closest ancestor (parent, parent of parent, etc.) of this Parent which has the given sequence type. If include_self is True and this Parent has the given type, returns this object. Raises NoSuchAncestorException if no ancestor with the given type exists.

Parameters
  • sequence_type (str) – Sequence type

  • include_self – Include this sequence as a candidate

abstract has_ancestor_of_type(sequence_type: Union[str, SequenceType], include_self: bool = True) bool

Returns True if some ancestor (parent, parent of parent, etc.) of this Parent has the given sequence type, or False otherwise. If include_self is True and this Parent has the given type, returns True.

Parameters
  • sequence_type (str) – Sequence type

  • include_self – Include this sequence as a candidate

abstract lift_child_location_to_parent()

Lifts location of child object on this parent to the parent of this parent. Raises ValueError if any required data is missing (child location or location of this parent on its parent).

Returns

Child object location lifted to the parent of this parent

Return type

Location

abstract reset_location(location) AbstractParent

Returns a new Parent object with child location set to the given location

abstract has_ancestor_sequence(sequence, include_self: bool = True) bool

Returns True iff this Parent has some ancestor (parent, parent of parent, etc.) whose sequence attribute is equal to the given sequence. If include_self is True and this Parent has sequence attribute equal to the given sequence, returns True.