biocantor.io.features

This module contains code shared between the Genbank and GFF3 parser for extracting feature information.

These regexes, enums and functions are used to identify primary name and ID values, as well as pull out possible feature types.

Package Contents

Classes

FeatureIntervalNameQualifiers

Keys that will be looked for to find human readable identifier(s) for a feature.

FeatureIntervalIDQualifiers

Keys that will be looked for to find identifiers for a feature.

Functions

extract_feature_types(feature_types, feature_qualifiers)

Extracts feature types from a qualifiers dictionary.

extract_feature_name_id(→ Tuple[str, str])

Extract primary feature name and ID from a qualifiers dictionary based on the hierarchy in

merge_qualifiers(→ Dict[Hashable, List[str]])

Merges two dicts of lists using sets.

Attributes

FEATURE_INTERVAL_NAME_QUALIFIERS

FEATURE_INTERVAL_NAME_QUALIFIERS_REGEX

FEATURE_INTERVAL_ID_QUALIFIERS

FEATURE_INTERVAL_ID_QUALIFIERS_REGEX

FEATURE_TYPE_IDENTIFIERS

FEATURE_TYPE_IDENTIFIERS_REGEX

class biocantor.io.features.FeatureIntervalNameQualifiers

Bases: enum.IntEnum

Keys that will be looked for to find human readable identifier(s) for a feature. Will be checked in a case-insensitive fashion.

The ordering of this enumeration matters, because the key-value pair found with the smallest value is the most important.

In order to be future proofed, the values of this enum are separated by 10, so that up to 9 new items can be inserted at every level without changing the values.

NOTE: If names are added to or changed in this enum, you must also change FEATURE_INTERVAL_NAME_QUALIFIERS.

FEATURE_NAME = 0
STANDARD_NAME = 10
NAME = 15
GENE = 20
GENE_NAME = 30
LABEL = 40
OPERON = 50
class biocantor.io.features.FeatureIntervalIDQualifiers

Bases: enum.IntEnum

Keys that will be looked for to find identifiers for a feature.

The ordering of this enumeration matters, because the key-value pair found with the smallest value is the most important.

In order to be future proofed, the values of this enum are separated by 10, so that up to 9 new items can be inserted at every level without changing the values.

NOTE: If names are added to or changd in this enum, you must also change FEATURE_INTERVAL_ID_QUALIFIERS.

FEATURE_ID = 0
ID = 255
biocantor.io.features.FEATURE_INTERVAL_NAME_QUALIFIERS
biocantor.io.features.FEATURE_INTERVAL_NAME_QUALIFIERS_REGEX
biocantor.io.features.FEATURE_INTERVAL_ID_QUALIFIERS
biocantor.io.features.FEATURE_INTERVAL_ID_QUALIFIERS_REGEX
biocantor.io.features.FEATURE_TYPE_IDENTIFIERS
biocantor.io.features.FEATURE_TYPE_IDENTIFIERS_REGEX
biocantor.io.features.extract_feature_types(feature_types: Set[str], feature_qualifiers: Dict[str, List[str]])

Extracts feature types from a qualifiers dictionary.

Parameters
  • feature_types – Set of feature types. Starts with the primary feature type.

  • feature_qualifiers – Qualifiers associated with a record from a genbank parsing event.

biocantor.io.features.extract_feature_name_id(feature_qualifiers: Dict[str, List[str]]) Tuple[str, str]

Extract primary feature name and ID from a qualifiers dictionary based on the hierarchy in FeatureIntervalNameKeys.

If there is more than one value for a key, the first is chosen.

Parameters

feature_qualifiers – Qualifiers associated with a record from a genbank parsing event.

Returns

The primary identifier name and ID.

biocantor.io.features.merge_qualifiers(qualifiers: Dict[Hashable, List[str]], other_qualifiers: Dict[Hashable, List[str]]) Dict[Hashable, List[str]]

Merges two dicts of lists using sets.

Could be made more efficient, probably.