Skip to main content

Access Azure Datalake Gen1 with fsspec and dask

Project description

Dask interface to Azure-Datalake Gen1 and Gen2 Storage Quickstart

PyPI version shields.io Latest conda-forge version

This package can be installed using:

pip install adlfs

or

conda install -c conda-forge adlfs

The adl:// and abfs:// protocols are included in fsspec's known_implementations registry in fsspec > 0.6.1, otherwise users must explicitly inform fsspec about the supported adlfs protocols.

To use the Gen1 filesystem:

import dask.dataframe as dd

storage_options={'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}

dd.read_csv('adl://{STORE_NAME}/{FOLDER}/*.csv', storage_options=storage_options)

To use the Gen2 filesystem you can use the protocol abfs or az:

import dask.dataframe as dd

storage_options={'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}

ddf = dd.read_csv('abfs://{CONTAINER}/{FOLDER}/*.csv', storage_options=storage_options)
ddf = dd.read_parquet('az://{CONTAINER}/folder.parquet', storage_options=storage_options)

To read from a public storage blob you are required to specify the 'account_name'. For example, you can access NYC Taxi & Limousine Commission as:

storage_options = {'account_name': 'azureopendatastorage'}
ddf = dd.read_parquet('az://nyctlc/green/puYear=2019/puMonth=*/*.parquet', storage_options=storage_options)

Details

The package includes pythonic filesystem implementations for both Azure Datalake Gen1 and Azure Datalake Gen2, that facilitate interactions between both Azure Datalake implementations and Dask. This is done leveraging the intake/filesystem_spec base class and Azure Python SDKs.

Operations against both Gen1 Datalake currently only work with an Azure ServicePrincipal with suitable credentials to perform operations on the resources of choice.

Operations against the Gen2 Datalake are implemented by leveraging Azure Blob Storage Python SDK. The AzureBlobFileSystem accepts all of the Async BlobServiceClient arguments.

By default, write operations create BlockBlobs in Azure, which, once written can not be appended.  It is possible to create an AppendBlob using an `mode="ab"` when creating, and then when operating on blobs.  Currently AppendBlobs are not available if hierarchical namespaces are enabled.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adlfs-0.5.9.tar.gz (36.9 kB view details)

Uploaded Source

Built Distribution

adlfs-0.5.9-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file adlfs-0.5.9.tar.gz.

File metadata

  • Download URL: adlfs-0.5.9.tar.gz
  • Upload date:
  • Size: 36.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.5

File hashes

Hashes for adlfs-0.5.9.tar.gz
Algorithm Hash digest
SHA256 674a6147a61aeedb2335430b8cd404c923f568f2d374f821ea39510ae1710d98
MD5 0b92ac242d5d12531367f1dfdcada3c9
BLAKE2b-256 cc32a2d81048544ff9252532f7a99e70d1dab640eda268c3ea162d6892908d26

See more details on using hashes here.

Provenance

File details

Details for the file adlfs-0.5.9-py3-none-any.whl.

File metadata

  • Download URL: adlfs-0.5.9-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for adlfs-0.5.9-py3-none-any.whl
Algorithm Hash digest
SHA256 11832acc36c48a7106840189804bdd2d7b326cc3d7e81cfacead50847bbbf2df
MD5 4d760828f1af63f22ed68f5b3aaf71c0
BLAKE2b-256 a5ebf112f24f8c0b9248e554373bb91e0e3f43ad8eec00bca5f4cd4b2681d5bc

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page