Location operations

Instantiate a Location

[ ]:
from inscripta.biocantor.location.location_impl import SingleInterval, CompoundInterval, EmptyLocation, Strand
from inscripta.biocantor.sequence import Sequence, Alphabet
from inscripta.biocantor.parent.parent import Parent

# No parent
single_interval = SingleInterval(5, 10, Strand.PLUS)
compound_interval = CompoundInterval([2, 8], [5, 13], Strand.PLUS)

# With parent sequence
compound_interval_with_sequence = CompoundInterval([2, 8], [5, 13], Strand.PLUS,
                                                   parent=Sequence(
                                                       "CTACGACTTCCGAGTCCAAAGTGTCCGTGT",
                                                       Alphabet.NT_STRICT,
                                                       type="chromosome",
                                                   ))

# Empty location (implemented as a singleton)
# Rarely needs to be directly instantiated, but is returned from method calls where appropriate
empty = EmptyLocation()

Data access

Start, end, strand

[ ]:
compound_interval.start, compound_interval.end, compound_interval.strand
(2, 13, <Strand.PLUS: 1>)

Other basic properties

[ ]:
compound_interval.num_blocks
2
[ ]:
compound_interval.is_contiguous
False
[ ]:
single_interval.is_empty
False
[ ]:
EmptyLocation().is_empty
True
[ ]:
single_interval.parent is None
True
[ ]:
compound_interval_with_sequence.parent
<Parent: id=None, type=chromosome, strand=+, location=CompoundInterval <2-5:+, 8-13:+>, sequence=<Sequence;
  Alphabet=NT_STRICT;
  Length=30;
  Parent=None;
  Type=chromosome>, parent=None>

List of contiguous blocks

[ ]:
compound_interval.blocks
[<SingleInterval 2-5:+>, <SingleInterval 8-13:+>]

Iterator over contiguous blocks in strand-relative order

[ ]:
block_iter = CompoundInterval([1, 8], [3, 10], Strand.MINUS).scan_blocks()
list(block_iter)
[<SingleInterval 8-10:->, <SingleInterval 1-3:->]

Extract underlying spliced sequence

[ ]:
compound_interval_with_sequence.extract_sequence()
<Sequence=ACGTCCGA;
  Alphabet=NT_STRICT;
  Length=8;
  Parent=None;
  Type=None>

Set theoretic operations

Overlap

[ ]:
SingleInterval(5, 10, Strand.PLUS).has_overlap(SingleInterval(9, 20, Strand.PLUS))
True
[ ]:
SingleInterval(5, 10, Strand.PLUS).has_overlap(SingleInterval(9, 20, Strand.MINUS))
True
[ ]:
SingleInterval(5, 10, Strand.PLUS).has_overlap(SingleInterval(9, 20, Strand.MINUS), match_strand=True)
False

Intersection

[ ]:
CompoundInterval([2, 8], [5, 13], Strand.PLUS).intersection(SingleInterval(4, 10, Strand.PLUS))
CompoundInterval <4-5:+, 8-10:+>
[ ]:
SingleInterval(0, 3, Strand.PLUS).intersection(SingleInterval(5, 8, Strand.PLUS))
EmptyLocation

Union

[ ]:
CompoundInterval([0, 10], [5, 15], Strand.PLUS).union(CompoundInterval([0, 8], [7, 9], Strand.PLUS))
CompoundInterval <0-7:+, 8-9:+, 10-15:+>

Contains

Check if each block of other location is contained in a block of this location

[ ]:
compound_interval.contains(CompoundInterval([2, 10], [4, 11], Strand.PLUS))
True

Minus

[ ]:
SingleInterval(10, 20, Strand.PLUS).minus(SingleInterval(13, 15, Strand.PLUS))
CompoundInterval <10-13:+, 15-20:+>

Gaps (introns)

[ ]:
# List of gaps as SingleInterval objects, ordered relative to location strand
CompoundInterval([10, 20, 30], [15, 25, 35], Strand.MINUS).gap_list()
[<SingleInterval 25-30:->, <SingleInterval 15-20:->]
[ ]:
# All gaps as one Location object
CompoundInterval([10, 20, 30], [15, 25, 35], Strand.MINUS).gaps_location()
CompoundInterval <15-20:-, 25-30:->

Other feature arithmetic operations

Distance to another location

[ ]:
compound_interval.distance_to(SingleInterval(20, 30, Strand.MINUS))
7

Extend endpoints, returning a new Location

[ ]:
SingleInterval(5, 10, Strand.MINUS).extend_relative(3, 4)
<SingleInterval 1-13:->
[ ]:
SingleInterval(5, 10, Strand.MINUS).extend_absolute(3, 4)
<SingleInterval 2-14:->

Reverse or reset strand

[ ]:
compound_interval.reverse_strand()
CompoundInterval <2-5:-, 8-13:->
[ ]:
compound_interval.reset_strand(Strand.MINUS)
CompoundInterval <2-5:-, 8-13:->

Reverse feature, flipping strand and block structure

[ ]:
compound_interval.reverse()
CompoundInterval <2-7:-, 10-13:->

Shift entire location left or right

[ ]:
SingleInterval(3, 5, Strand.MINUS).shift_position(-2)
<SingleInterval 1-3:->

Iterator over (spliced) windows

[ ]:
window_iter = CompoundInterval([1, 8], [6, 15], Strand.MINUS).scan_windows(window_size=3, step_size=2, start_pos=0)
[ ]:
list(window_iter)
[<SingleInterval 12-15:->,
 <SingleInterval 10-13:->,
 <SingleInterval 8-11:->,
 CompoundInterval <4-6:-, 8-9:->,
 <SingleInterval 2-5:->]

Operations on Parent hierarchy

Identify ancestors in Parent hierarchy

[ ]:
compound_interval_with_sequence.first_ancestor_of_type("chromosome")
<Parent: id=None, type=chromosome, strand=+, location=CompoundInterval <2-5:+, 8-13:+>, sequence=<Sequence;
  Alphabet=NT_STRICT;
  Length=30;
  Parent=None;
  Type=chromosome>, parent=None>
[ ]:
compound_interval_with_sequence.has_ancestor_of_type("other_seq_type")
False
[ ]:
compound_interval_with_sequence.has_ancestor_sequence(
    Sequence("CTACGACTTCCGAGTCCAAAGTGTCCGTGT", Alphabet.NT_STRICT, type="chromosome"))
True

Coordinate conversion

Establish a 3-level hierarchy

  • Highest level: all of chr1

  • Middle level: 30nt slice of chr1

  • Lowest level: a 10nt feature initially defined relative to the 30nt slice

[ ]:
# A Parent object representing a full chromosome
chr1 = Parent(id="chr1", sequence_type="chromosome")

# A slice of chr1 lying at positions 1000-1030
chromosome_slice_location = SingleInterval(1000, 1030, Strand.PLUS, parent=chr1)
chromosome_slice = Sequence("CTGATAGGGGATGCAGTATATCCCTGGATA", Alphabet.NT_STRICT,
                            parent=chr1.reset_location(location=chromosome_slice_location))

# A feature defined relative to the slice
feature = SingleInterval(5, 15, Strand.MINUS, parent=chromosome_slice)

Convert the feature to chromosome coordinates

[ ]:
feature.lift_over_to_first_ancestor_of_type("chromosome")
<SingleInterval <Parent: id=chr1, type=chromosome, strand=-, location=<SingleInterval 1005-1015:->, sequence=None, parent=None>:1005-1015:->

Convert a feature-relative position to slice-relative

[ ]:
feature.relative_to_parent_pos(6)
8

Convert a feature-relative interval to slice-relative

[ ]:
feature.relative_interval_to_parent_location(7, 9, Strand.MINUS)
<SingleInterval <Parent: id=None, type=None, strand=+, location=<SingleInterval 6-8:+>, sequence=<Sequence;
  Alphabet=NT_STRICT;
  Length=30;
  Parent=<Parent: id=chr1, type=chromosome, strand=+, location=<SingleInterval <Parent: id=chr1, type=chromosome, strand=+, location=<SingleInterval 1000-1030:+>, sequence=None, parent=None>:1000-1030:+>, sequence=None, parent=None>;
  Type=None>, parent=<Parent: id=chr1, type=chromosome, strand=+, location=<SingleInterval <Parent: id=chr1, type=chromosome, strand=+, location=<SingleInterval 1000-1030:+>, sequence=None, parent=None>:1000-1030:+>, sequence=None, parent=None>>:6-8:+>

Convert a chromosome-relative position to slice-relative

[ ]:
chromosome_slice.\
  location_on_parent.\
  parent_to_relative_pos(1007)
7

Convert a chromosome-relative feature to slice-relative

[ ]:
chromosome_slice.\
  location_on_parent.\
  parent_to_relative_location(SingleInterval(990, 1010, Strand.MINUS, parent=chr1))
<SingleInterval 0-10:->

Location of one feature relative to another feature

Express the intersection of two locations in coordinates relative to one of the locations

[ ]:
feature.location_relative_to(SingleInterval(11, 20, Strand.PLUS, parent=chromosome_slice))
<SingleInterval 0-4:->
[ ]: