Skip to main content

Access Azure Datalake Gen1 with fsspec and dask

Project description

Dask interface to Azure-Datalake Gen1 and Gen2 Storage

Warning: this code is experimental and untested.

Quickstart

This package is on PyPi and can be installed using:

pip install adlfs

To use the Gen1 filesystem:

import dask.dataframe as dd
from fsspec.registry import known_implementations
known_implementations['adl'] = {'class': 'adlfs.AzureDatalakeFileSystem'}
STORAGE_OPTIONS={'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}

dd.read_csv('adl://{STORE_NAME}/{FOLDER}/*.csv', storage_options=STORAGE_OPTIONS)

To use the Gen2 filesystem:

import dask.dataframe as dd
from fsspec.registry import known_implementations
known_implementations['abfs'] = {'class': 'adlfs.AzureBlobFileSystem'}
STORAGE_OPTIONS={'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}

ddf = dd.read_csv('abfs://{CONTAINER}/{FOLDER}/*.csv', storage_options=STORAGE_OPTIONS)
ddf = dd.read_csv('abfs://{CONTAINER}/folder.parquet', storage_options=STORAGE_OPTIONS)

Details

The package includes pythonic filesystem implementations for both Azure Datalake Gen1 and Azure Datalake Gen2, that facilitate interactions between both Azure Datalake implementations and Dask. This is done leveraging the intake/filesystem_spec base class and Azure Python SDKs.

Operations against both Gen1 Datalake currently only work with an Azure ServicePrincipal with suitable credentials to perform operations on the resources of choice.

Operations against the Gen2 Datalake are implemented by leveraging multi-protocol access, using the Azure Blob Storage Python SDK. The AzureBlobFileSystem accepts all of the BlockBlobService arguments.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adlfs-0.1.3.tar.gz (8.3 kB view details)

Uploaded Source

File details

Details for the file adlfs-0.1.3.tar.gz.

File metadata

  • Download URL: adlfs-0.1.3.tar.gz
  • Upload date:
  • Size: 8.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.2.post20191201 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.3

File hashes

Hashes for adlfs-0.1.3.tar.gz
Algorithm Hash digest
SHA256 041d9506614cbf468fdb533d8201f266ad04d0652a1e6c2ff468b77a57424959
MD5 962f46e8b03174d7715a0bbb3cec124d
BLAKE2b-256 60ac46e4aaadb2d86aac15852770af0ea189116eee086c38d106c2c1f9b85e15

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page