Skip to main content

Python package to hash dictionaries using default hash, md5, sha256 and more.

Project description

Dict Hash

Pypi project License Pypi total project downloads Github Actions

Python package to hash dictionaries using both default hash and sha256. It comes with full support for hashing Pandas & Polars DataFrame/Series objects, Numba objects and Numpy arrays. It supports both objects from Pandas 1.x and 2.x and Numpy 1.x and 2.x.

Furthermore, the library supports objects that can be recursively hashed.

As we saw this library being used in the wild mostly to create caching libraries and wrappers, we'd like to point you to our library, Cache decorator.

Why can't I just use the default hash function?

In Python, dictionaries just aren't hashable. This is because they are mutable objects, and as such, they cannot be hashed. If you were to try and run hash({}), you would get a TypeError exception.

How do I install this package?

As usual, just download it using pip:

pip install dict_hash

Usage examples

The package offers two functions: sha256 to generate constant sha256 hashes and dict_hash, to generate hashes using the native hash function.

Session hash with dict_hash

Obtain a session hash from the given dictionary.

from dict_hash import dict_hash
from random_dict import random_dict
from random import randint

d = random_dict(randint(1, 10), randint(1, 10))
my_hash = dict_hash(d)

Consistent hashes

Obtain a consistent hash from the given dictionary. Supported methods include md5, sha256, sha1, sha224, sha384, sha512, sha3_512, shake_128, shake_256, sha3_384, sha3_256, sha3_224, blake2s, blake2b, as provided from the hashlib library.

For instance, to obtain a sha256 hash from the given dictionary:

from dict_hash import sha256
from random_dict import random_dict
from random import randint

d = random_dict(randint(1, 10), randint(1, 10))
my_hash = sha256(d)

The methods shake_128 and shake_256 expose the length paramater to specify the length of the hash digest.

from dict_hash import shake_128
from random_dict import random_dict
from random import randint

d = random_dict(randint(1, 10), randint(1, 10))
my_hash = shake_128(d, hash_length=16)

Approximated hash

All of the methods shown offer the use_approximation parameter, which allows you to switch to a more lightweight hashing procedure where supported, for the various supported objects. This procedure will randomly subsample the provided objects.

Currently, we support this parameter for NumPy, Polars, and Pandas objects.

from dict_hash import sha256
from random_dict import random_dict
from random import randint

d = random_dict(randint(1, 10), randint(1, 10))
my_hash = sha256(d)

approximated_hash = sha256(d, use_approximation=True)

Behavior on error

If the hashing function encounters an object that it cannot hash, it will by default raise a NotHashableException exception. You can choose whether this or other options happen by setting the behavior_on_error parameter. You can choose between:

  • raise: Raise a NotHashableException exception.
  • warn: Print a NotHashableWarning and continue hashing, setting the unhashable object to "Unhashable object" string.
  • ignore: Ignore the object and continue hashing, setting the unhashable object to "Unhashable object" string.

Recursive objects

In Python it is possible to have recursive objects, such as a dictionary that contains itself. When you attempt to hash such an object, the hashing function will raise a RecursionError exception, which you can customize with the maximal_recursion parameter, by default equal to 100. The RecursionError is most commonly then handled as a NotHashableException, and as such you can set the behavior_on_error parameter to handle it as you see fit.

Hashable

When handling complex objects within the dictionaries, you may need to implement the class Hashable in that object.

Here is an example:

from dict_hash import Hashable, sha256

class MyHashable(Hashable):

    def __init__(self, a: int):
        self._a = a
        self._time = time()

    def consistent_hash(self, use_approximation: bool = False) -> str:
        # The use approximation would be useful when the object is too large,
        # while in this example it may be a bit pointless.
        if use_approximation:
            return sha256({
                "a": self._a
            }, use_approximation=True)
        return sha256({
            "a": self._a
        })

License

This software is distributed under the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dict_hash-1.3.5.tar.gz (12.1 kB view details)

Uploaded Source

File details

Details for the file dict_hash-1.3.5.tar.gz.

File metadata

  • Download URL: dict_hash-1.3.5.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.1 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.9

File hashes

Hashes for dict_hash-1.3.5.tar.gz
Algorithm Hash digest
SHA256 32614b913c220100f0e06d44130bd2d96a24de5abbf5189ae3180978975f91a3
MD5 1c281efc1b3f7a54a40076e1d3f90536
BLAKE2b-256 31a820be4c9d0f9add4fc000893d44f74f85098ac1bd0a3a41bc2162546a2801

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page