biocantor.io.features
This module contains code shared between the Genbank and GFF3 parser for extracting feature information.
These regexes, enums and functions are used to identify primary name and ID values, as well as pull out possible feature types.
Package Contents
Classes
Keys that will be looked for to find human readable identifier(s) for a feature. |
|
Keys that will be looked for to find identifiers for a feature. |
Functions
|
Extracts feature types from a qualifiers dictionary. |
|
Extract primary feature name and ID from a qualifiers dictionary based on the hierarchy in |
|
Merges two dicts of lists using sets. |
Attributes
- class biocantor.io.features.FeatureIntervalNameQualifiers
Bases:
enum.IntEnum
Keys that will be looked for to find human readable identifier(s) for a feature. Will be checked in a case-insensitive fashion.
The ordering of this enumeration matters, because the key-value pair found with the smallest value is the most important.
In order to be future proofed, the values of this enum are separated by 10, so that up to 9 new items can be inserted at every level without changing the values.
NOTE: If names are added to or changed in this enum, you must also change FEATURE_INTERVAL_NAME_QUALIFIERS.
- FEATURE_NAME = 0
- STANDARD_NAME = 10
- NAME = 15
- GENE = 20
- GENE_NAME = 30
- LABEL = 40
- OPERON = 50
- class biocantor.io.features.FeatureIntervalIDQualifiers
Bases:
enum.IntEnum
Keys that will be looked for to find identifiers for a feature.
The ordering of this enumeration matters, because the key-value pair found with the smallest value is the most important.
In order to be future proofed, the values of this enum are separated by 10, so that up to 9 new items can be inserted at every level without changing the values.
NOTE: If names are added to or changd in this enum, you must also change FEATURE_INTERVAL_ID_QUALIFIERS.
- FEATURE_ID = 0
- ID = 255
- biocantor.io.features.FEATURE_INTERVAL_NAME_QUALIFIERS
- biocantor.io.features.FEATURE_INTERVAL_NAME_QUALIFIERS_REGEX
- biocantor.io.features.FEATURE_INTERVAL_ID_QUALIFIERS
- biocantor.io.features.FEATURE_INTERVAL_ID_QUALIFIERS_REGEX
- biocantor.io.features.FEATURE_TYPE_IDENTIFIERS
- biocantor.io.features.FEATURE_TYPE_IDENTIFIERS_REGEX
- biocantor.io.features.extract_feature_types(feature_types: Set[str], feature_qualifiers: Dict[str, List[str]])
Extracts feature types from a qualifiers dictionary.
- Parameters
feature_types – Set of feature types. Starts with the primary feature type.
feature_qualifiers – Qualifiers associated with a record from a genbank parsing event.
- biocantor.io.features.extract_feature_name_id(feature_qualifiers: Dict[str, List[str]]) Tuple[str, str]
Extract primary feature name and ID from a qualifiers dictionary based on the hierarchy in FeatureIntervalNameKeys.
If there is more than one value for a key, the first is chosen.
- Parameters
feature_qualifiers – Qualifiers associated with a record from a genbank parsing event.
- Returns
The primary identifier name and ID.