biocantor.util.hashing

Perform MD5 digests of arbitrary objects in memory in python. All objects are unpacked and their string representations are digested.

TODO: The current implementation of this has not been profiled and likely has room for optimization that could improve

performance on larger datasets.

Module Contents

Functions

_order_set(→ List[str])

Lexicographically orders a set and converts all values to strings to enable order comparison.

_order_dict_of_possible_sets(→ Iterable[str])

Recursively search a dictionary for sets and order them.

_encode_object_for_digest(→ Iterable[str])

Inner function for digest_object() that produces the string representations. This helps with debugging.

digest_object(→ uuid.UUID)

MD5 digest of any arbitrary set of python objects. Must be utf-8 encodeable.

biocantor.util.hashing._order_set(set_of_hashables: Set[Hashable]) List[str]

Lexicographically orders a set and converts all values to strings to enable order comparison.

Parameters

set_of_hashables – A set of hashable items

Returns

An lexicographically ordered list of strings

biocantor.util.hashing._order_dict_of_possible_sets(dict_of_possible_sets: Dict[Hashable, Any]) Iterable[str]

Recursively search a dictionary for sets and order them.

Parameters
  • dict_of_possible_sets – Any dictionary. The values must have stable string representations that are uniquely

  • hashable

  • sets. (unless these values are a set or a dictionary of) –

Returns

Ordered list of tuples

biocantor.util.hashing._encode_object_for_digest(*args, **kwargs) Iterable[str]

Inner function for digest_object() that produces the string representations. This helps with debugging.

biocantor.util.hashing.digest_object(*args, **kwargs) uuid.UUID

MD5 digest of any arbitrary set of python objects. Must be utf-8 encodeable.

Because the string representation of sets are not guaranteed to be ordered across different python interpreters, we recursively hunt for sets in args and kwargs and then sort them. This sorting requires conversion to string representation for set members. All items in args and kwargs are also utf-8 encoded before hashing.

Args can be any set of objects with stable string representations, as well as sets. Kwargs can be any set of objects with stable string representations, including nested dicts of sets. The argument name in the kwargs dictionary are part of the hash produced.