Instantiating BioCantor objects
Instantiate a Location
object
Location
objects store coordinates and a Strand
. Here we demonstrate the two main subclasses of the abstract base class Location
: SingleInterval
(a single contiguous interval) and CompoundInterval
(a set of possibly disjoint contiguous blocks).
Location
s can optionally be situated within a coordinate system or hierarchy of coordinate systems by passing Parent
information to them.
Imports
[1]:
# NOTE: Currently this import order matters, because there is a circular dependency between inscripta.biocantor.sequence and inscripta.biocantor.parent
from inscripta.biocantor.location.location_impl import SingleInterval, CompoundInterval, Strand
from inscripta.biocantor.sequence import Sequence, Alphabet
from inscripta.biocantor.parent import Parent
SingleInterval
without parent
A simple interval with start, end and strand; no coordinate system is specified.
[2]:
simple_interval = SingleInterval(3, 5, Strand.MINUS)
SingleInterval
with parent
Location
constructors include an optional parent
argument. A full Parent
object can be passed to this argument. Alternatively, if that Parent
object would be simple, shortcuts are provided to pass other object types for the parameter, avoiding having to call the Parent
constructor directly. See below for examples of this.
[3]:
# Construct a sequence that locations will be defined relative to
seq = Sequence("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", Alphabet.NT_STRICT)
# Construct an interval relative to the sequence. Pass a Parent object to its constructor which specifies
# parent ID and the sequence
interval_on_seq = SingleInterval(3, 5, Strand.MINUS, parent=Parent(id="parent", sequence=seq))
Here are some examples where the parent has only one attribute. The shortcut can be used, passing the one attribute directly instead of calling the Parent constructor.
(Note: this shortcut works for all parent
s that have only one attribute with the exception of the sequence_type
argument to the Parent
constructor.)
[4]:
# Parent has an ID (name) only
interval_with_parent_id = SingleInterval(3, 5, Strand.MINUS, parent="parent")
# Parent has a sequence only
interval_with_parent_seq = SingleInterval(3, 5, Strand.MINUS, parent=seq)
# Note: if only a string is passed for parent, that string is assumed to represent the parent ID.
# The parent constructor has one other string argument: sequence_type. In the event you want to construct
# a Location with a parent that has only a sequence_type attribute, you need to call the Parent constructor.
interval_with_parent_type = SingleInterval(3, 5, Strand.MINUS, parent=Parent(sequence_type="chromosome"))
CompoundInterval
from start and end coordinates, no parent
[5]:
simple_compound_interval = CompoundInterval([5, 10, 20, 30], [6, 15, 22, 35], Strand.PLUS)
CompoundInterval
from list of SingleInterval
s, no parent
[6]:
compound_interval_from_blocks = CompoundInterval.from_single_intervals([
SingleInterval(2, 5, Strand.PLUS), SingleInterval(8, 11, Strand.PLUS)])
CompoundInterval
with parent
Parent information can be passed to CompoundInterval
s in the same way as SingleInterval
s.
Instantiate a Sequence
object
Sequence
objects store sequence data, an Alphabet
, and several optional attributes. Sequence
s can optionally be situated within a coordinate system or hierarchy of coordinate systems by passing Parent
information to them.
Minimal Sequence
[7]:
simple_sequence = Sequence("AAACCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", Alphabet.NT_STRICT)
Sequence with parent
In this example, a Sequence
represents a slice of a chromosome and has parent
attribute reflecting this. The location
argument to the Parent
constructor represents the location of the child relative to that parent. In this case, it is the location of the chromosome slice relative to the chromosome.
[8]:
chromosome_slice = Sequence("TTTTTTT", Alphabet.NT_STRICT,
parent=Parent(id="chr1",
location=SingleInterval(1000, 1007, Strand.PLUS),
sequence_type="chromosome"))
Other optional Sequence
attributes
Sequences can also have an optional id
(name) and type
(string representing a sequence type, for downstream calculations). Finally, the Sequence
constructor has an optional parameter validate_alphabet
which, when set to False
, disables the requirement that the sequence data conform to the provided Alphabet
.
[9]:
seq_with_all_attributes = Sequence(data="AAAAAAA",
alphabet=Alphabet.NT_STRICT,
id="my_sequence",
type="chromosome_slice",
parent=Parent(id="chr1", location=SingleInterval(33, 40, Strand.MINUS)))
seq_with_no_alphabet_validation = Sequence("XXXXXXX", Alphabet.NT_STRICT, validate_alphabet=False)
Instantiate a Parent
object
Parent
is a fairly abstract concept in BioCantor. Parent
objects can represent different aspects of a relationship between an implied child and the given parent. Parent
objects do not store pointers to children; a child is implied but cannot be directly accessed from the Parent
. Instead, objects store pointers to their Parent
s. Location
s, Sequence
s, and Parent
s can have Parent
s.
All arguments to the Parent
constructor are optional and any combination of them may be passed. The important thing to understand is that some Parent
attributes describe the Parent
object in its own right, while some describe the relationship of an implied child to the parent.
Parent
attributes that describe the parent object itself:
id
sequence_type
sequence
parent
(Parent of this parent)
Parent
attributes that define the relationship of the implied child to the parent:
location
(The location of the child relative to this parent)strand
(The strand of the child relative to this parent)
Furthermore, some arguments are redundant; for example, if location
is passed then it will include a strand and strand
is not needed. The reason both are offered is for the case where the child has a strand relative to its parent but no further location data. Similarly, sequence
and sequence_type
are mutually redundant if sequence
includes a sequence type. For these cases, there is no problem with passing both arguments, and conflicting values will cause an exception to be
raised.
Parent
with an ID only
This is useful when multiple child objects need to be associated with a parent (say, for purposes of coordinate calculations requiring that locations refer to the same coordinate system), but no other parent data is required. For example, this could be used to associate many features annotated to the same chromosome without needing to hold the chromosome sequence in memory.
[10]:
parent_with_id_only = Parent(id="chr1")
Parent
representing a sequence with an ID, and a location for the implied child
[11]:
parent_with_seq_and_location = Parent(id="seq", sequence=seq, location=SingleInterval(11, 18, Strand.PLUS))
Parent
with its own parent, establishing a multi-layer hierarchy
This Parent
object represents a chunk of a chromosome and is associated with the full chromosome as its own parent. Note the location of the chromosome slice relative to the chromosome is specified in the parent of this parent. This is because the location
argument defines the location of the child relative to the parent.
[12]:
parent_with_parent = Parent(id="chr1:1000-2000",
sequence_type="chromosome_slice",
parent=Parent(id="chr1",
sequence_type="chromosome",
location=SingleInterval(1000, 2000, Strand.PLUS)))
[ ]: