biocantor.io.parser
Core parser functionality. Contains the dataclass ParsedAnnotationRecord
which wraps annotations produced
by any of the parser with optional sequence information.
Module Contents
Classes
Dataclass that wraps a |
Functions
|
Convert a string into a Parent object. This is the intermediate that transfers a BioPython sequence object to |
|
Construct a sequence chunk parent from a sequence. This is used when an annotation collection is being |
- class biocantor.io.parser.ParsedAnnotationRecord
Dataclass that wraps a
AnnotationCollectionModel
along with an accompanyingSeqRecord
to store sequence information.This is an intermediate that allows for sequence information to be applied to collection objects downstream. This can be done with
to_annotation_collection()
.- annotation :inscripta.biocantor.io.models.AnnotationCollectionModel
- seqrecord :Optional[Bio.SeqRecord.SeqRecord]
- alphabet :Optional[inscripta.biocantor.sequence.alphabet.Alphabet]
- to_annotation_collection() inscripta.biocantor.gene.collections.AnnotationCollection
Export to a final model. Will apply the sequence information, if it exists (there is a SeqRecord).
- static parsed_annotation_records_to_model(annotations: Iterable[ParsedAnnotationRecord]) Iterable[inscripta.biocantor.gene.collections.AnnotationCollection]
Convenience function for converting an iterable of ParsedAnnotationRecords file to object model.
Take a iterator of class:biocantor.io.parser.ParsedAnnotationRecord and yield an iterable of
AnnotationCollection
.This incorporates sequence information on to each
TranscriptInterval
andFeatureInterval
object.- Parameters
annotations – Iterable that comes from a parser function.
- Yields
AnnotationCollection
with sequence information.
- to_fasta(fasta_file_handle: TextIO)
Convenience function that writes the associated SeqRecord in this record to FASTA.
- Parameters
fasta_file_handle – Open file handle to write to.
- Raises
FastaExportError` if the associated SeqRecord is null –
- biocantor.io.parser.seq_to_parent(seq: str, alphabet: Optional[inscripta.biocantor.sequence.alphabet.Alphabet] = Alphabet.NT_EXTENDED_GAPPED, seq_id: Optional[str] = None, seq_type: Optional[str] = SequenceType.CHROMOSOME) inscripta.biocantor.parent.Parent
Convert a string into a Parent object. This is the intermediate that transfers a BioPython sequence object to a BioCantor sequence object.
NOTE: This sequence is assumed to be the entire chromosome.
- Parameters
seq – String of sequence.
alphabet – Alphabet this sequence is in.
seq_id – ID to attach to the Parent.
seq_type – Sequence type to attach to the Parent.
- Returns
A
Parent
object.
- biocantor.io.parser.seq_chunk_to_parent(seq: str, sequence_name: Union[uuid.UUID, str], start: int, end: int, strand: Optional[inscripta.biocantor.location.strand.Strand] = Strand.PLUS, alphabet: Optional[inscripta.biocantor.sequence.alphabet.Alphabet] = Alphabet.NT_EXTENDED_GAPPED) inscripta.biocantor.parent.Parent
Construct a sequence chunk parent from a sequence. This is used when an annotation collection is being instantiated with a subset of a genome sequence.
NOTE: This sequence is assumed to be a subset of a chromosome. There is no way to validate that within this function.
- Parameters
seq – Sequence subset to use.
sequence_name – The name of the sequence.
start – The genomic start position of this sequence.
end – The genomic end position of this sequence.
strand – The strand this chunk is relative to the genome.
alphabet – The alphabet the sequence is in.
- Returns
An instantiated Parent object ready to be passed to a constructor.