Skip to main content

wraps gsutil, a command-line interface to Google Cloud Storage.

Project description

gsutilwrap

gsutilwrap wraps Google Storage gsutil command-line interface in order to simplify the deployment and backup tasks related to Google Cloud Storage. It provides a set of data manipulation commands including copying, reading, writing and hashing stored data.

We primarly needed something simple that can still leverage mutli-threading, has decent progress output and implements robust pattern matching. Since gsutil CLI already provides all this functionality, we decided to wrap it. The wrapper adds type-annotated arguments and provides code inspection and autocomplete feature in an IDE such as PyCharm.

Additionally, since gsutil lacked copying of multiple patterns to multiple targets, we created this extra feature in gsutilwrap.

If you need to transfer data from/to Google Cloud Storage in the core of your application, we would recommend you to use the library google-cloud-storage provided by Google itself. That library is much more sophisticated in terms of features and would not incur you the overhead of authorizing and spawning a process for each operation. However, it lacks pattern matching (except for matching the prefixes) and you have to manage multi-threading and progress output yourself.

Usage

import pathlib

import gsutilwrap

# list
lst = gsutilwrap.ls(
    'gs://some-bucket/some-path/**/*.txt')

lst = gsutilwrap.ls_many(
    ['gs://some-bucket/some-path/**/*.txt',
     'gs://another-bucket/another-path/**/*.xml'],
    multithreaded=True)

# if you need a listing with size and update time, use long_ls
entries = gsutilwrap.long_ls(
    'gs://some-bucket/some-path/**/*.txt')

for entry in entries:
    print("File size and update time of {}: {} {}".format(
        entry.url, entry.size, entry.update_time))

# write/read text
gsutilwrap.write_text(
    url='gs://some-bucket/some-path/some-file.txt',
    text='some text')

text = gsutilwrap.read_text(
    url='gs://some-bucket/some-path/some-file.txt')

# write/read bytes
gsutilwrap.write_bytes(
    url='gs://some-bucket/some-path/some-file.bin',
    data=b'x\DE\xAD\xBE\xEF')

data = gsutilwrap.read_bytes(
    url='gs://some-bucket/some-path/some-file.bin')

# copy
gsutilwrap.copy(
    pattern="gs://some-bucket/some-path/*.txt",
    target="/some/dir")

gsutilwrap.copy_many_to_one(
    patterns=[
        "gs://some-bucket/some-path/*.txt",
        "gs://some-bucket/some-path/*.xml"
    ],
    target="/some/dir")

gsutilwrap.copy_many_to_many(
    patterns_targets=[
        ("gs://some-bucket/some-path/*.txt", "/some/dir"),
        ("gs://some-bucket/some-path/*.xml", "/some/other/dir")
    ])

# stat an object
stat = gsutilwrap.stat(
    url='gs://some-bucket/some-path/some-file.txt')
print("Modification time: {}".format(stat.file_mtime))
print("Size: {}".format(stat.content_length))
print("MD5: {}".format(stat.md5.hex()))

Installation

  • Create a virtual environment:

python3 -m venv venv3
  • Activate it:

source venv3/bin/activate
  • Install gsutilwrap with pip:

pip3 install gsutilwrap

Development

  • Check out the repository.

  • In the repository root, create the virtual environment:

python3 -m venv venv3
  • Activate the virtual environment:

source venv3/bin/activate
  • Install the development dependencies:

pip3 install -e .[dev]
  • We provide a set of live tests. The live tests need an existing bucket in the Google Cloud Storage. You need to set the URL prefix which will be used for all the live tests via the environment variable TEST_GSUTILWRAP_URL_PREFIX.

    Mind that the live tests will use Google Cloud resources for which you will be billed. Always check that no resources are used after the tests finished so that you don’t incur an unnecessary cost!

  • We use tox for testing and packaging the distribution. Assuming that the virtual environment has been activated and the development dependencies have been installed, run:

tox
  • We also provide a set of pre-commit checks that lint and check code for formatting. Run them locally from an activated virtual environment with development dependencies:

./precommit.py
  • The pre-commit script can also automatically format the code:

./precommit.py  --overwrite

Versioning

We follow Semantic Versioning. The version X.Y.Z indicates:

  • X is the major version (backward-incompatible),

  • Y is the minor version (backward-compatible), and

  • Z is the patch version (backward-compatible bug fix).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gsutilwrap-1.1.1.tar.gz (11.5 kB view details)

Uploaded Source

File details

Details for the file gsutilwrap-1.1.1.tar.gz.

File metadata

  • Download URL: gsutilwrap-1.1.1.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/20.7.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.5.2

File hashes

Hashes for gsutilwrap-1.1.1.tar.gz
Algorithm Hash digest
SHA256 54e04fcd9494b6fa1144033b2e7d7e8664159339a38b6de48f6757da74e57d76
MD5 4fee64bde7ec0c5666f84a36aa6589f3
BLAKE2b-256 dba59d602cb99768437b75a2f2b4d1a9b89461a1425e574e383ced7cf64bea85

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page