Instantiating BioCantor objects

Instantiate a Location object

Location objects store coordinates and a Strand. Here we demonstrate the two main subclasses of the abstract base class Location: SingleInterval (a single contiguous interval) and CompoundInterval (a set of possibly disjoint contiguous blocks).

Locations can optionally be situated within a coordinate system or hierarchy of coordinate systems by passing Parent information to them.

Imports

[1]:
# NOTE: Currently this import order matters, because there is a circular dependency between inscripta.biocantor.sequence and inscripta.biocantor.parent
from inscripta.biocantor.location.location_impl import SingleInterval, CompoundInterval, Strand
from inscripta.biocantor.sequence import Sequence, Alphabet
from inscripta.biocantor.parent import Parent

SingleInterval without parent

A simple interval with start, end and strand; no coordinate system is specified.

[2]:
simple_interval = SingleInterval(3, 5, Strand.MINUS)

SingleInterval with parent

Location constructors include an optional parent argument. A full Parent object can be passed to this argument. Alternatively, if that Parent object would be simple, shortcuts are provided to pass other object types for the parameter, avoiding having to call the Parent constructor directly. See below for examples of this.

[3]:
# Construct a sequence that locations will be defined relative to
seq = Sequence("AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", Alphabet.NT_STRICT)
# Construct an interval relative to the sequence. Pass a Parent object to its constructor which specifies
# parent ID and the sequence
interval_on_seq = SingleInterval(3, 5, Strand.MINUS, parent=Parent(id="parent", sequence=seq))

Here are some examples where the parent has only one attribute. The shortcut can be used, passing the one attribute directly instead of calling the Parent constructor.

(Note: this shortcut works for all parents that have only one attribute with the exception of the sequence_type argument to the Parent constructor.)

[4]:
# Parent has an ID (name) only
interval_with_parent_id = SingleInterval(3, 5, Strand.MINUS, parent="parent")

# Parent has a sequence only
interval_with_parent_seq = SingleInterval(3, 5, Strand.MINUS, parent=seq)

# Note: if only a string is passed for parent, that string is assumed to represent the parent ID.
# The parent constructor has one other string argument: sequence_type. In the event you want to construct
# a Location with a parent that has only a sequence_type attribute, you need to call the Parent constructor.
interval_with_parent_type = SingleInterval(3, 5, Strand.MINUS, parent=Parent(sequence_type="chromosome"))

CompoundInterval from start and end coordinates, no parent

[5]:
simple_compound_interval = CompoundInterval([5, 10, 20, 30], [6, 15, 22, 35], Strand.PLUS)

CompoundInterval from list of SingleIntervals, no parent

[6]:
compound_interval_from_blocks = CompoundInterval.from_single_intervals([
    SingleInterval(2, 5, Strand.PLUS), SingleInterval(8, 11, Strand.PLUS)])

CompoundInterval with parent

Parent information can be passed to CompoundIntervals in the same way as SingleIntervals.

Instantiate a Sequence object

Sequence objects store sequence data, an Alphabet, and several optional attributes. Sequences can optionally be situated within a coordinate system or hierarchy of coordinate systems by passing Parent information to them.

Minimal Sequence

[7]:
simple_sequence = Sequence("AAACCCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", Alphabet.NT_STRICT)

Sequence with parent

In this example, a Sequence represents a slice of a chromosome and has parent attribute reflecting this. The location argument to the Parent constructor represents the location of the child relative to that parent. In this case, it is the location of the chromosome slice relative to the chromosome.

[8]:
chromosome_slice = Sequence("TTTTTTT", Alphabet.NT_STRICT,
                            parent=Parent(id="chr1",
                                          location=SingleInterval(1000, 1007, Strand.PLUS),
                                          sequence_type="chromosome"))

Other optional Sequence attributes

Sequences can also have an optional id (name) and type (string representing a sequence type, for downstream calculations). Finally, the Sequence constructor has an optional parameter validate_alphabet which, when set to False, disables the requirement that the sequence data conform to the provided Alphabet.

[9]:
seq_with_all_attributes = Sequence(data="AAAAAAA",
                                   alphabet=Alphabet.NT_STRICT,
                                   id="my_sequence",
                                   type="chromosome_slice",
                                   parent=Parent(id="chr1", location=SingleInterval(33, 40, Strand.MINUS)))

seq_with_no_alphabet_validation = Sequence("XXXXXXX", Alphabet.NT_STRICT, validate_alphabet=False)

Instantiate a Parent object

Parent is a fairly abstract concept in BioCantor. Parent objects can represent different aspects of a relationship between an implied child and the given parent. Parent objects do not store pointers to children; a child is implied but cannot be directly accessed from the Parent. Instead, objects store pointers to their Parents. Locations, Sequences, and Parents can have Parents.

All arguments to the Parent constructor are optional and any combination of them may be passed. The important thing to understand is that some Parent attributes describe the Parent object in its own right, while some describe the relationship of an implied child to the parent.

Parent attributes that describe the parent object itself:

  • id

  • sequence_type

  • sequence

  • parent (Parent of this parent)

Parent attributes that define the relationship of the implied child to the parent:

  • location (The location of the child relative to this parent)

  • strand (The strand of the child relative to this parent)

Furthermore, some arguments are redundant; for example, if location is passed then it will include a strand and strand is not needed. The reason both are offered is for the case where the child has a strand relative to its parent but no further location data. Similarly, sequence and sequence_type are mutually redundant if sequence includes a sequence type. For these cases, there is no problem with passing both arguments, and conflicting values will cause an exception to be raised.

Parent with an ID only

This is useful when multiple child objects need to be associated with a parent (say, for purposes of coordinate calculations requiring that locations refer to the same coordinate system), but no other parent data is required. For example, this could be used to associate many features annotated to the same chromosome without needing to hold the chromosome sequence in memory.

[10]:
parent_with_id_only = Parent(id="chr1")

Parent representing a sequence with an ID, and a location for the implied child

[11]:
parent_with_seq_and_location = Parent(id="seq", sequence=seq, location=SingleInterval(11, 18, Strand.PLUS))

Parent with its own parent, establishing a multi-layer hierarchy

This Parent object represents a chunk of a chromosome and is associated with the full chromosome as its own parent. Note the location of the chromosome slice relative to the chromosome is specified in the parent of this parent. This is because the location argument defines the location of the child relative to the parent.

[12]:
parent_with_parent = Parent(id="chr1:1000-2000",
                            sequence_type="chromosome_slice",
                            parent=Parent(id="chr1",
                                          sequence_type="chromosome",
                                          location=SingleInterval(1000, 2000, Strand.PLUS)))
[ ]: