Skip to main content

Easily dump python objects to files, and then load them back.

Project description

Vlermv makes it easy to save Python objects to files with meaningful identifiers. The package Vlermv provides two interfaces.

Vlermv

vlermv.Vlermv is a dictionary interface.

Vlermv Cache

vlermv.cache is a decorator that you can use for caching the output of a function.

Using Vlermv

Vlermv provides a dictionary-like object that is associated with a particular directory on your computer.

from vlermv import Vlermv
vlermv = Vlermv('/tmp/a-directory')

The keys correspond to files, and the values get pickled to the files.

vlermv['filename'] = range(100)

import pickle
range(100) == pickle.load(open('/tmp/a-directory/filename', 'rb'))

You can also read and delete things.

# Read
range(100) == vlermv['filename']

# Delete
del(vlermv['filename'])

The coolest part is that the key gets interpreted in a fancy way. Aside from strings and string-like objects, you can use iterables of strings; all of these indices refer to the file /tmp/a-directory/foo/bar/baz:

vlermv[('foo','bar','baz')]
vlermv[['foo','bar','baz']]

If you pass a relative path to a file, it will be broken up as you’d expect; that is, strings get split on slashes and backslashes.

vlermv['foo/bar/baz']
vlermv['foo\\bar\\baz']

Note well: Specifying an absolute path won’t save things outside the vlermv directory.

vlermv['/foo/bar/baz'] # -> foo, bar, baz
vlermv['C:\\foo\\bar\\baz'] # -> c, foo, bar, baz
                               # (lowercase "c")

If you pass a URL, it will also get broken up in a reasonable way.

# /tmp/a-directory/http/thomaslevine.com/!/?foo=bar#baz
vlermv['http://thomaslevine.com/!/?foo=bar#baz']

# /tmp/a-directory/thomaslevine.com/!?foo=bar#baz
vlermv['thomaslevine.com/!?foo=bar#baz']

Dates and datetimes get converted to YYYY-MM-DD format.

import datetime

# /tmp/a-directory/2014-02-26
vlermv[datetime.date(2014,2,26)]
vlermv[datetime.datetime(2014,2,26,13,6,42)]

And you can mix these formats!

# /tmp/a-directory/http/thomaslevine.com/open-data/2014-02-26
vlermv[('http://thomaslevine.com/open-data', datetime.date(2014,2,26))]

It also has typical dictionary methods like keys, values, items, and update.

Using Vlermv Cache

A function receives input, does something, and then returns output. If you decorate a function with Vlermv Cache, it caches the output; if you call the function again with the same input, it loads the output from the cache instead of doing what it would normally do.

The simplest usage is to decorate the function with @vlermv.cache(). For example,

@vlermv.cache()
def is_prime(number):
    for n in range(2, number):
        if number % n == 0:
            return False
    return True

Now you can call is_prime as if it’s a normal function, and if you call it twice, the second call will load from the cache.

Some fancier uses are discussed below.

Non-default directory

If you pass no arguments to cache, as in the example above, the cache will be stored in a directory named after the function. To set a different directory, pass it as an argument.

@vlermv.cache('~/.primes')
def is_prime(number):
    for n in range(2, number):
        if number % n == 0:
            return False
    return True

I recommend storing your caches in dotted directories under your home directory, as you see above.

Configuration

The kwargs get passed to vlermv.Vlermv, so you can do fun things like changing the serialization function.

@vlermv.cache('~/.http', serializer = vlermv.serializers.identity)
def get(url):
    return requests.get(url).text

Read more about the keyword arguments in the Vlermv section above.

Non-identifying arguments

If you want to pass an argument but not use it as an identifier, pass a non-keyword argument; those get passed along to the function but don’t form the identifier. For example,

@vlermv.cache('~/.http')
def get(url, auth = None):
    return requests.get(url, auth = auth)

get('http://this.website.com', auth = ('username', 'password')

Refreshing the cache

I find that I sometimes want to refresh the cache for a particular file, only. This is usually because an error occurred and I have fixed the error. You can delete the cache like this.

@vlermv.cache()
def is_prime(number):
    for n in range(2, number):
        if number % n == 0:
            return False
    return True

is_prime(100)
del(is_prime[100])

Vlermv Cache has all of Vlermv’s features

The above method for refreshing the cache works because is_prime isn’t really a function; it’s actually a VlermvCache object, which is a sub-class of Vlermv. Thus, you can use it in all of the ways that you can use Vlermv.

@vlermv.cache()
def f(x, y):
    return x + y

print(f(3,4))
# 7

print(list(f.keys()))
# ['3/4']

You can even set the value to be something weird.

f[('a', 8)] = None, {'key':'value'}
print(f('a', 8))
# 0

Each value in f is a tuple of the error and the actual value. Exactly one of these is always None. If the error is None, the value is returned, and if the value is None, the error is raised.

Better than Mongo

Vlermv is nearly better than Mongo, so you should use it anywhere where you were previously using Mongo. Vlermv is designed for write-heavy workloads that need scalability (easy sharding), flexible schemas, and highly configurable indexing.

Things that are missing for a full Mongo replacement

  • Protection against inode exhaustion

  • Ability to treate a directory as a document and thus to make atomic edits within a document

  • Transactions (Mongo doesn’t have them, but they would be cool.)

  • Indices maybe? In case you want an index on something other than the filename

ACID properties

Atomicity

Writes are made to a temporary file that gets renamed.

Consistency

No validation is supported, so the database is always consistent by definition.

Isolation

Vlermv has isolation within files/documents/values but not across. You may implement your own multi-file transactions.

Durability

All data are saved to disk right away.

History

I wrote pickle warehouse so I could easily store pickles in files with meaningful names. Then I extended it to support more than just pickles. Then I wrote picklecache, mainly for caching responses to HTTP requests.

And then I finally changed the names because these two packages don’t really have much to do with pickles. pickle_warehouse.Warehouse became vlermv.Vlermv, and picklecache.cache became vlermv.cache. I chose the name “vlermv” by banging on the keyboard; this is how I have been naming things now that I have discovered Dada.

During the transition from pickle warehouse and pickle cache to vlermv, I also added a particular feature that I wanted: the ability to refresh the cache. (And I implemented this by subclassing vlermv.Vlermv, as discussed in the documentation above.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vlermv-0.2.3.tar.gz (6.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page