`biocantor.io.features`

This module contains code shared between the Genbank and GFF3 parser for extracting feature information.

These regexes, enums and functions are used to identify primary name and ID values, as well as pull out possible feature types.

Package Contents

Classes

`FeatureIntervalNameQualifiers`	Keys that will be looked for to find human readable identifier(s) for a feature.
`FeatureIntervalIDQualifiers`	Keys that will be looked for to find identifiers for a feature.

Functions

`extract_feature_types`(feature_types, feature_qualifiers)	Extracts feature types from a qualifiers dictionary.
`extract_feature_name_id`(→ Tuple[str, str])	Extract primary feature name and ID from a qualifiers dictionary based on the hierarchy in
`merge_qualifiers`(→ Dict[Hashable, List[str]])	Merges two dicts of lists using sets.

Attributes

`FEATURE_INTERVAL_NAME_QUALIFIERS`
`FEATURE_INTERVAL_NAME_QUALIFIERS_REGEX`
`FEATURE_INTERVAL_ID_QUALIFIERS`
`FEATURE_INTERVAL_ID_QUALIFIERS_REGEX`
`FEATURE_TYPE_IDENTIFIERS`
`FEATURE_TYPE_IDENTIFIERS_REGEX`

class biocantor.io.features.FeatureIntervalNameQualifiers

Bases: enum.IntEnum

Keys that will be looked for to find human readable identifier(s) for a feature. Will be checked in a case-insensitive fashion.

The ordering of this enumeration matters, because the key-value pair found with the smallest value is the most important.

In order to be future proofed, the values of this enum are separated by 10, so that up to 9 new items can be inserted at every level without changing the values.

NOTE: If names are added to or changed in this enum, you must also change FEATURE_INTERVAL_NAME_QUALIFIERS.

FEATURE_NAME = 0

STANDARD_NAME = 10

NAME = 15

GENE = 20

GENE_NAME = 30

LABEL = 40

OPERON = 50

class biocantor.io.features.FeatureIntervalIDQualifiers

Bases: enum.IntEnum

Keys that will be looked for to find identifiers for a feature.

The ordering of this enumeration matters, because the key-value pair found with the smallest value is the most important.

In order to be future proofed, the values of this enum are separated by 10, so that up to 9 new items can be inserted at every level without changing the values.

NOTE: If names are added to or changd in this enum, you must also change FEATURE_INTERVAL_ID_QUALIFIERS.

FEATURE_ID = 0

ID = 255

biocantor.io.features.FEATURE_INTERVAL_NAME_QUALIFIERS

biocantor.io.features.FEATURE_INTERVAL_NAME_QUALIFIERS_REGEX

biocantor.io.features.FEATURE_INTERVAL_ID_QUALIFIERS

biocantor.io.features.FEATURE_INTERVAL_ID_QUALIFIERS_REGEX

biocantor.io.features.FEATURE_TYPE_IDENTIFIERS

biocantor.io.features.FEATURE_TYPE_IDENTIFIERS_REGEX

biocantor.io.features.extract_feature_types(feature_types: Set[str], feature_qualifiers: Dict[str, List[str]])

Extracts feature types from a qualifiers dictionary.

Parameters

feature_types – Set of feature types. Starts with the primary feature type.
feature_qualifiers – Qualifiers associated with a record from a genbank parsing event.

biocantor.io.features.extract_feature_name_id(feature_qualifiers: Dict[str, List[str]]) → Tuple[str, str]

Extract primary feature name and ID from a qualifiers dictionary based on the hierarchy in FeatureIntervalNameKeys.

If there is more than one value for a key, the first is chosen.

Parameters: feature_qualifiers – Qualifiers associated with a record from a genbank parsing event.
Returns: The primary identifier name and ID.

biocantor.io.features.merge_qualifiers(qualifiers: Dict[Hashable, List[str]], other_qualifiers: Dict[Hashable, List[str]]) → Dict[Hashable, List[str]]

Merges two dicts of lists using sets.

Could be made more efficient, probably.

biocantor.io.features

Package Contents

Classes

Functions

Attributes

`biocantor.io.features`