biocantor.io.genbank.writer
Writes GenBank formatted files out, given one or more AnnotationCollection
.
These models must have sequences to retrieve.
There are two flavors of GenBank supported here:
1. Prokaryotic – each gene has a direct descendant, which is its intervals. In other words, all annotations come in
pairs where you have gene
followed by [CDS, tRNA, rRNA, ...]
.
2. Eukaryotic – a more standard gene model, where the top level is always called gene
, then there is a child
[mRNA, tRNA, ...]
and if the case where the child is mRNA
, then there are CDS
features.
Module Contents
Functions
|
Take an instantiated |
|
Converts either a |
Converts a |
|
|
Converts a |
Converts a |
- biocantor.io.genbank.writer.collection_to_genbank(collections: List[inscripta.biocantor.gene.AnnotationCollection], genbank_file_handle_or_path: Union[TextIO, str, pathlib.Path], genbank_type: Optional[inscripta.biocantor.io.genbank.constants.GenbankFlavor] = GenbankFlavor.PROKARYOTIC, force_strand: Optional[bool] = True, organism: Optional[str] = None, source: Optional[str] = None, seqrecord_annotations: Optional[List[Dict[str, Any]]] = None, update_translations: bool = False)
Take an instantiated
AnnotationCollection
and produce a GenBank file.- Parameters
collections – Iterable of AnnotationCollections. They must have sequences associated with them.∂
genbank_file_handle_or_path – Open file handle or path to write GenBank file to.
genbank_type – Are we writing an prokaryotic or eukaryotic style GenBank file?
force_strand – Boolean flag; if
True
, then strand on children is forced, ifFalse
, then improper strands are instead skipped.organism – What string to put in the ORGANISM field? If not set, will be a period.
source – What string to put in the SOURCE field? If not set, will be the basename of the GenBank path.
seqrecord_annotations – An arbitrary dictionary of annotations to include. If
organism
orsource
are set both in this function call and in this dictionary, they will be over-written. Must be a list of the same length as the collections.update_translations – Should the /translation tag be calculated or re-calculated? This is a time consuming process.
- biocantor.io.genbank.writer.gene_to_feature(gene_or_feature: Union[inscripta.biocantor.gene.GeneInterval, inscripta.biocantor.gene.FeatureIntervalCollection], genbank_type: inscripta.biocantor.io.genbank.constants.GenbankFlavor, force_strand: bool, translation_table: inscripta.biocantor.gene.TranslationTable, update_translations: bool) Iterable[Bio.SeqFeature.SeqFeature]
Converts either a
GeneInterval
or aFeatureIntervalCollection
to aBio.SeqFeature.SeqFeature
.Bio.SeqFeature.SeqFeature
are BioPython objects that will then be used to write to a GenBank file. There is oneBio.SeqFeature.SeqFeature
for every feature, or row group, in the output file. There will be one contiguous interval at the Gene level.While
GeneInterval
always has its interval on the plus strand, GenBank files assume that a Gene has an explicit strand. Therefore, this function picks the most common strand and forces it on all of its children.- Parameters
gene_or_feature – A
GeneInterval
orFeatureIntervalCollection
.genbank_type – Are we writing an prokaryotic or eukaryotic style GenBank file?
force_strand – Boolean flag; if
True
, then strand on children is forced, ifFalse
, then improper strands are instead skipped.translation_table – Translation table to use.
update_translations – Should the /translation tag be calculated or re-calculated? This is a time consuming process.
- Yields
- ``SeqFeature``s, one for the gene, one for each child transcript, and one for each transcript’s CDS if it
exists.
- biocantor.io.genbank.writer.transcripts_to_feature(transcripts: List[inscripta.biocantor.gene.TranscriptInterval], strand: inscripta.biocantor.location.strand.Strand, genbank_type: inscripta.biocantor.io.genbank.constants.GenbankFlavor, force_strand: bool, translation_table: inscripta.biocantor.gene.TranslationTable, gene_symbol: Optional[str] = None, locus_tag: Optional[str] = None, update_translations: bool = False) Iterable[Bio.SeqFeature.SeqFeature]
Converts a
TranscriptInterval
to aBio.SeqFeature.SeqFeature
.Bio.SeqFeature.SeqFeature
are BioPython objects that will then be used to write to a GenBank file. There is oneBio.SeqFeature.SeqFeature
for every feature, or row group, in the output file. There will be one joined interval at the transcript level representing the exonic structure.While transcript members of a gene can have different strands, for GenBank files that is not allowed. This function will explicitly force the strand and provide a warning that this is happening.
In eukaryotic mode, this function will create mRNA features for coding genes, and biotype features for non-coding. Coding genes are then passed on to create CDS features.
In prokaryotic mode, this function will only create biotype features for non-coding genes.
- Parameters
transcripts – A list of
TranscriptInterval
.strand –
Strand
that this gene lives on.genbank_type – Are we writing an prokaryotic or eukaryotic style GenBank file?
force_strand – Boolean flag; if
True
, then strand is forced, ifFalse
, then improper strands are instead skipped.gene_symbol – An optional gene symbol.
locus_tag – An optional locus tag.
translation_table – Translation table to use.
update_translations – Should the /translation tag be calculated or re-calculated? This is a time consuming process.
- Yields
``SeqFeature``s, one for each transcript and then one for each CDS of the transcript, if it exists.
- biocantor.io.genbank.writer.add_cds_feature(transcript: inscripta.biocantor.gene.TranscriptInterval, transcript_qualifiers: Dict[Hashable, List[Hashable]], strand: inscripta.biocantor.location.strand.Strand, translation_table: inscripta.biocantor.gene.TranslationTable, update_translations: bool) Bio.SeqFeature.SeqFeature
Converts a
TranscriptInterval
that has a CDS to aBio.SeqFeature.SeqFeature
. that represents the spliced CDS interval.- Parameters
transcript – A
TranscriptInterval
.strand –
Strand
that this transcript lives on.transcript_qualifiers – Qualifiers dictionary from the transcript level feature.
translation_table – Translation table to use.
update_translations – Should the /translation tag be calculated or re-calculated? This is a time consuming process.
- Returns
SeqFeature
for the CDS of this transcript.
- biocantor.io.genbank.writer.feature_intervals_to_features(features: List[inscripta.biocantor.gene.FeatureInterval], strand: inscripta.biocantor.location.strand.Strand, force_strand: bool, feature_name: Optional[str] = None, locus_tag: Optional[str] = None) Iterable[Bio.SeqFeature.SeqFeature]
Converts a
FeatureInterval
to aBio.SeqFeature.SeqFeature
.Bio.SeqFeature.SeqFeature
are BioPython objects that will then be used to write to a GenBank file. There is oneBio.SeqFeature.SeqFeature
for every feature, or row group, in the output file. There will be one joined interval at the transcript level representing the exonic structure.While transcript members of a gene can have different strands, for GenBank files that is not allowed. This function will explicitly force the strand and provide a warning that this is happening.
- Parameters
features – A list of
TranscriptInterval
.strand –
Strand
that this gene lives on.force_strand – Boolean flag; if
True
, then strand is forced, ifFalse
, then improper strands are instead skipped.feature_name – An optional feature name.
locus_tag – An optional locus tag.
- Yields
A ``SeqFeature``s for each feature.