biocantor.io.parser

Core parser functionality. Contains the dataclass ParsedAnnotationRecord which wraps annotations produced by any of the parser with optional sequence information.

Module Contents

Classes

ParsedAnnotationRecord

Dataclass that wraps a AnnotationCollectionModel along with an accompanying

Functions

seq_to_parent(→ inscripta.biocantor.parent.Parent)

Convert a string into a Parent object. This is the intermediate that transfers a BioPython sequence object to

seq_chunk_to_parent(→ inscripta.biocantor.parent.Parent)

Construct a sequence chunk parent from a sequence. This is used when an annotation collection is being

class biocantor.io.parser.ParsedAnnotationRecord

Dataclass that wraps a AnnotationCollectionModel along with an accompanying SeqRecord to store sequence information.

This is an intermediate that allows for sequence information to be applied to collection objects downstream. This can be done with to_annotation_collection().

annotation :inscripta.biocantor.io.models.AnnotationCollectionModel
seqrecord :Optional[Bio.SeqRecord.SeqRecord]
alphabet :Optional[inscripta.biocantor.sequence.alphabet.Alphabet]
to_annotation_collection() inscripta.biocantor.gene.collections.AnnotationCollection

Export to a final model. Will apply the sequence information, if it exists (there is a SeqRecord).

static parsed_annotation_records_to_model(annotations: Iterable[ParsedAnnotationRecord]) Iterable[inscripta.biocantor.gene.collections.AnnotationCollection]

Convenience function for converting an iterable of ParsedAnnotationRecords file to object model.

Take a iterator of class:biocantor.io.parser.ParsedAnnotationRecord and yield an iterable of AnnotationCollection.

This incorporates sequence information on to each TranscriptInterval and FeatureInterval object.

Parameters

annotations – Iterable that comes from a parser function.

Yields

AnnotationCollection with sequence information.

to_fasta(fasta_file_handle: TextIO)

Convenience function that writes the associated SeqRecord in this record to FASTA.

Parameters

fasta_file_handle – Open file handle to write to.

Raises

FastaExportError` if the associated SeqRecord is null

biocantor.io.parser.seq_to_parent(seq: str, alphabet: Optional[inscripta.biocantor.sequence.alphabet.Alphabet] = Alphabet.NT_EXTENDED_GAPPED, seq_id: Optional[str] = None, seq_type: Optional[str] = SequenceType.CHROMOSOME) inscripta.biocantor.parent.Parent

Convert a string into a Parent object. This is the intermediate that transfers a BioPython sequence object to a BioCantor sequence object.

NOTE: This sequence is assumed to be the entire chromosome.

Parameters
  • seq – String of sequence.

  • alphabet – Alphabet this sequence is in.

  • seq_id – ID to attach to the Parent.

  • seq_type – Sequence type to attach to the Parent.

Returns

A Parent object.

biocantor.io.parser.seq_chunk_to_parent(seq: str, sequence_name: Union[uuid.UUID, str], start: int, end: int, strand: Optional[inscripta.biocantor.location.strand.Strand] = Strand.PLUS, alphabet: Optional[inscripta.biocantor.sequence.alphabet.Alphabet] = Alphabet.NT_EXTENDED_GAPPED) inscripta.biocantor.parent.Parent

Construct a sequence chunk parent from a sequence. This is used when an annotation collection is being instantiated with a subset of a genome sequence.

NOTE: This sequence is assumed to be a subset of a chromosome. There is no way to validate that within this function.

Parameters
  • seq – Sequence subset to use.

  • sequence_name – The name of the sequence.

  • start – The genomic start position of this sequence.

  • end – The genomic end position of this sequence.

  • strand – The strand this chunk is relative to the genome.

  • alphabet – The alphabet the sequence is in.

Returns

An instantiated Parent object ready to be passed to a constructor.