adlfs

Access Azure Datalake Gen1 with fsspec and dask

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language

Reason this release was yanked:

Changing behavior with anonymous logins to public repos causes user issues

Project description

Filesystem interface to Azure-Datalake Gen1 and Gen2 Storage

Quickstart

This package can be installed using:

pip install adlfs

conda install -c conda-forge adlfs

The adl:// and abfs:// protocols are included in fsspec's known_implementations registry in fsspec > 0.6.1, otherwise users must explicitly inform fsspec about the supported adlfs protocols.

To use the Gen1 filesystem:

import dask.dataframe as dd

storage_options={'tenant_id': TENANT_ID, 'client_id': CLIENT_ID, 'client_secret': CLIENT_SECRET}

dd.read_csv('adl://{STORE_NAME}/{FOLDER}/*.csv', storage_options=storage_options)

To use the Gen2 filesystem you can use the protocol abfs or az:

import dask.dataframe as dd

storage_options={'account_name': ACCOUNT_NAME, 'account_key': ACCOUNT_KEY}

ddf = dd.read_csv('abfs://{CONTAINER}/{FOLDER}/*.csv', storage_options=storage_options)
ddf = dd.read_parquet('az://{CONTAINER}/folder.parquet', storage_options=storage_options)

or optionally, if AZURE_STORAGE_ACCOUNT_NAME and an AZURE_STORAGE_<CREDENTIAL> is 
set as an environmental variable, then storage_options will be read from the environmental
variables. In case none of them is specified, it will fall back to the azure identity library [default authentication methods](https://docs.microsoft.com/en-us/python/api/azure-identity/azure.identity.defaultazurecredential)

To read from a public storage blob you are required to specify the 'account_name'. For example, you can access NYC Taxi & Limousine Commission as:

storage_options = {'account_name': 'azureopendatastorage'}
ddf = dd.read_parquet('az://nyctlc/green/puYear=2019/puMonth=*/*.parquet', storage_options=storage_options)

Details

The package includes pythonic filesystem implementations for both Azure Datalake Gen1 and Azure Datalake Gen2, that facilitate interactions between both Azure Datalake implementations and Dask. This is done leveraging the intake/filesystem_spec base class and Azure Python SDKs.

Operations against both Gen1 Datalake currently only work with an Azure ServicePrincipal with suitable credentials to perform operations on the resources of choice.

Operations against the Gen2 Datalake are implemented by leveraging Azure Blob Storage Python SDK.

The filesystem can be instantiated with a variety of credentials, including:
    account_name
    account_key
    sas_token
    connection_string
    Azure ServicePrincipal credentials (which requires tenant_id, client_id, client_secret)
    location_mode:  valid value are "primary" or "secondary" and apply to RA-GRS accounts

The following enviornmental variables can also be set and picked up for authentication:
    "AZURE_STORAGE_CONNECTION_STRING"
    "AZURE_STORAGE_ACCOUNT_NAME"
    "AZURE_STORAGE_ACCOUNT_KEY"
    "AZURE_STORAGE_SAS_TOKEN"
    "AZURE_STORAGE_CLIENT_SECRET"
    "AZURE_STORAGE_CLIENT_ID"
    "AZURE_STORAGE_TENANT_ID"

The AzureBlobFileSystem accepts all of the Async BlobServiceClient arguments.

By default, write operations create BlockBlobs in Azure, which, once written can not be appended.  It is possible to create an AppendBlob using an `mode="ab"` when creating, and then when operating on blobs.  Currently AppendBlobs are not available if hierarchical namespaces are enabled.

Project details

These details have not been verified by PyPI

Project links

Homepage

Intended Audience
- Developers
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

2024.7.0

Jul 22, 2024

2024.4.1

Apr 15, 2024

2024.4.0

Apr 13, 2024

2024.2.0

Feb 5, 2024

2024.1.0

Jan 29, 2024

2023.12.0

Dec 23, 2023

2023.10.0

Oct 17, 2023

2023.9.0

Sep 17, 2023

2023.8.0

Aug 8, 2023

2023.4.0

Apr 27, 2023

2023.1.0

Jan 17, 2023

2022.11.2

Nov 24, 2022

2022.11.1

Nov 24, 2022

2022.11.0 yanked

Nov 23, 2022

Reason this release was yanked:

AzureDatalakeFileSystem fails import

2022.10.0

Oct 3, 2022

2022.9.1

Sep 6, 2022

2022.9.0

Sep 6, 2022

2022.7.0

Jul 9, 2022

2022.4.0

Apr 15, 2022

2022.4.0a0 pre-release

Apr 15, 2022

2022.2.0

Feb 5, 2022

2021.10.0

Oct 3, 2021

2021.9.1

Sep 10, 2021

2021.8.2

Aug 18, 2021

2021.8.1

Aug 13, 2021

2021.7.1

Jul 19, 2021

This version

2021.7.0 yanked

Jul 12, 2021

Reason this release was yanked:

Changing behavior with anonymous logins to public repos causes user issues

0.7.7

Jun 14, 2021

0.7.6

Jun 9, 2021

0.7.5

May 11, 2021

0.7.4

Apr 26, 2021

0.7.3

Apr 15, 2021

0.7.2

Apr 12, 2021

0.7.1

Apr 9, 2021

0.7.0

Mar 31, 2021

0.6.3

Feb 16, 2021

0.6.2

Feb 12, 2021

0.6.1

Feb 9, 2021

0.6.0

Jan 15, 2021

0.5.9

Dec 19, 2020

0.5.8

Dec 9, 2020

0.5.7

Nov 19, 2020

0.5.5

Oct 6, 2020

0.5.4

Oct 4, 2020

0.5.3

Sep 15, 2020

0.5.2 yanked

Sep 15, 2020

0.5.1

Sep 10, 2020

0.5.0

Sep 7, 2020

0.4.0

Aug 20, 2020

0.3.3

Aug 13, 2020

0.3.2

Aug 2, 2020

0.3.1

Jun 15, 2020

0.3.0

May 19, 2020

0.2.5

May 19, 2020

0.2.4

Apr 21, 2020

0.2.3

Apr 21, 2020

0.2.2

Apr 20, 2020

0.2.0

Feb 15, 2020

0.1.5

Dec 17, 2019

0.1.4

Dec 16, 2019

0.1.3

Dec 15, 2019

0.1.3a0 pre-release

Dec 16, 2019

0.1.2

Nov 25, 2019

0.1.1

Nov 14, 2019

0.1.0

Oct 20, 2019

0.0.11

Oct 15, 2019

0.0.10.post2

Oct 14, 2019

0.0.10.post1

Oct 14, 2019

0.0.10.post0

Oct 14, 2019

0.0.10

Oct 14, 2019

0.0.9.post0

Oct 9, 2019

0.0.9

Oct 9, 2019

0.0.8.post3

Oct 9, 2019

0.0.8.post2

Oct 9, 2019

0.0.8.post1

Oct 9, 2019

0.0.8.post0

Oct 9, 2019

0.0.8

Oct 9, 2019

0.0.8a0 pre-release

Oct 9, 2019

0.0.7

Sep 23, 2019

0.0.6

Sep 19, 2019

0.0.5

Sep 11, 2019

0.0.5a0 pre-release

Sep 18, 2019

0.0.2

Aug 11, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adlfs-2021.7.0.tar.gz (38.1 kB view hashes)

Uploaded Jul 12, 2021 Source

Hashes for adlfs-2021.7.0.tar.gz

Hashes for adlfs-2021.7.0.tar.gz
Algorithm	Hash digest
SHA256	`e4b76ae974389498db8fced0890c1de8225299d544d4913c1645cec0ff6a8a9a`
MD5	`5ada403aed618db527db3794db92470b`
BLAKE2b-256	`bcd7a36bfdb7b991fcf45858a30f388b0c87714fa4b3bbe7e0a197158fc2851a`