`biocantor.io.parser`

Core parser functionality. Contains the dataclass ParsedAnnotationRecord which wraps annotations produced by any of the parser with optional sequence information.

Module Contents

Classes

ParsedAnnotationRecord

Dataclass that wraps a AnnotationCollectionModel along with an accompanying

Functions

`seq_to_parent`(→ inscripta.biocantor.parent.Parent)	Convert a string into a Parent object. This is the intermediate that transfers a BioPython sequence object to
`seq_chunk_to_parent`(→ inscripta.biocantor.parent.Parent)	Construct a sequence chunk parent from a sequence. This is used when an annotation collection is being

class biocantor.io.parser.ParsedAnnotationRecord

Dataclass that wraps a AnnotationCollectionModel along with an accompanying SeqRecord to store sequence information.

This is an intermediate that allows for sequence information to be applied to collection objects downstream. This can be done with to_annotation_collection().

annotation :inscripta.biocantor.io.models.AnnotationCollectionModel

seqrecord :Optional[Bio.SeqRecord.SeqRecord]

alphabet :Optional[inscripta.biocantor.sequence.alphabet.Alphabet]

to_annotation_collection() → inscripta.biocantor.gene.collections.AnnotationCollection: Export to a final model. Will apply the sequence information, if it exists (there is a SeqRecord).

static parsed_annotation_records_to_model(annotations: Iterable[ParsedAnnotationRecord]) → Iterable[inscripta.biocantor.gene.collections.AnnotationCollection]

Convenience function for converting an iterable of ParsedAnnotationRecords file to object model.

Take a iterator of class:biocantor.io.parser.ParsedAnnotationRecord and yield an iterable of AnnotationCollection.

This incorporates sequence information on to each TranscriptInterval and FeatureInterval object.

Parameters: annotations – Iterable that comes from a parser function.
Yields: AnnotationCollection with sequence information.

to_fasta(fasta_file_handle: TextIO)

Convenience function that writes the associated SeqRecord in this record to FASTA.

Parameters: fasta_file_handle – Open file handle to write to.
Raises: FastaExportError` if the associated SeqRecord is null –

biocantor.io.parser.seq_to_parent(seq: str, alphabet: Optional[inscripta.biocantor.sequence.alphabet.Alphabet] = Alphabet.NT_EXTENDED_GAPPED, seq_id: Optional[str] = None, seq_type: Optional[str] = SequenceType.CHROMOSOME) → inscripta.biocantor.parent.Parent

Convert a string into a Parent object. This is the intermediate that transfers a BioPython sequence object to a BioCantor sequence object.

NOTE: This sequence is assumed to be the entire chromosome.

Parameters

seq – String of sequence.
alphabet – Alphabet this sequence is in.
seq_id – ID to attach to the Parent.
seq_type – Sequence type to attach to the Parent.

Returns

A Parent object.

biocantor.io.parser.seq_chunk_to_parent(seq: str, sequence_name: Union[uuid.UUID, str], start: int, end: int, strand: Optional[inscripta.biocantor.location.strand.Strand] = Strand.PLUS, alphabet: Optional[inscripta.biocantor.sequence.alphabet.Alphabet] = Alphabet.NT_EXTENDED_GAPPED) → inscripta.biocantor.parent.Parent

Construct a sequence chunk parent from a sequence. This is used when an annotation collection is being instantiated with a subset of a genome sequence.

NOTE: This sequence is assumed to be a subset of a chromosome. There is no way to validate that within this function.