Skip to main content

Python extension for lance

Project description

Python bindings for Lance file format

Lance is a cloud-native columnar data format designed for managing large-scale computer vision datasets in production environments. Lance delivers blazing fast performance for image and video data use cases from analytics to point queries to training scans.

Why use Lance

You should use lance if you're a ML engineer looking to be 10x more productive when working with computer vision datasets:

  1. Lance saves you from having to manage multiple systems and formats for metadata, raw assets, labeling updates, and vector indices.
  2. Lance's custom column encoding means you don't need to choose between fast analytics and fast point queries.
  3. Lance has a first-class Apache Arrow integration so it's easy to create and query Lance datasets (e.g., you can directly query lance datasets using DuckDB with no extra work)
  4. Did we mention Lance is fast.

Try Lance

Install Lance from pip (use a venv, not conda):

pip install pylance duckdb

In python:

import lance
import duckdb

# Understand Label distribution of Oxford Pet Dataset
ds = lance.dataset("s3://eto-public/datasets/oxford_pet/pet.lance")
duckdb.query('select label, count(1) from ds group by label').to_arrow_table()

Caveat emptor

  • DON'T use Conda as it prefers it's on ld path and libstd etc
  • Currently only wheels are on pypi and no sdist. See below for instructions on building from source.
  • Python 3.8-3.10 is supported on Linux x86_64
  • Python 3.10 on MacOS (both x86_64 and Arm64) is supported

Developing Lance

Install python3, pip, and venv, and setup a virtual environment for Lance. Again, DO NOT USE CONDA (at least for now).

sudo apt install python3-pip python3-venv python3-dev
python3 -m venv ${HOME}/.venv/lance

Arrow C++ libs

Install Arrow C++ libs using instructions from Apache Arrow. These instructions don't include Arrow's python lib so after you go through the above, don't forget to apt install libarrow-python-dev or yum install libarrow-python-devel.

Build pyarrow

Assume CWD is where you want to put the repo:

source ${HOME}/.venv/lance/bin/activate
cd /path/to/lance/python/thirdparty
./build.sh

Make sure pyarrow works properly:

import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds

Build Lance

  1. Build the cpp lib. See lance/cpp/README.md for instructions.
  2. Build the python module in venv:
source ${HOME}/.venv/lance/bin/activate
python setup.py develop

Test the installation using the same queries in Try Lance section.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

pylance-0.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

pylance-0.0.5-cp310-cp310-macosx_11_0_arm64.whl (12.9 MB view details)

Uploaded CPython 3.10 macOS 11.0+ ARM64

pylance-0.0.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

pylance-0.0.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.0 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

File details

Details for the file pylance-0.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8db9effe99b1a7056e136a47be6570d9ba7f52a90e130f4d5052ace1410619a5
MD5 bb4ef29aeabeb3f9594b2bafaa807de8
BLAKE2b-256 45619b9ae503b4a54fdca467ccc896ea24cc8e6829d64d99da6a734e92e89a75

See more details on using hashes here.

Provenance

File details

Details for the file pylance-0.0.5-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pylance-0.0.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f5608ba9e9a5a1bcee10fa107b76bfdbd97ead4b7a18dbe0145bf900892cd995
MD5 229c16d77115a77f328f8c94a5bb814e
BLAKE2b-256 5a83f30d24251a176efa607614c4e2669dca979fe3472ceffeab3b00bb096327

See more details on using hashes here.

Provenance

File details

Details for the file pylance-0.0.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6a0dff84398794409fb36e0981418dc9c5006763f7732fafbb9659a0d97a47ef
MD5 6bab5631a4c2ad3147b42d7c5f5910c6
BLAKE2b-256 db31b277eddf0cb37f6f7bf1307218c89a3797133932885c27a9b350284e07b6

See more details on using hashes here.

Provenance

File details

Details for the file pylance-0.0.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pylance-0.0.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0aa2305f449549e2cc0e2cb4ab72aa194329ec63c9f635f1333f6300ab29acb7
MD5 f5a4bc9a5927c4b14e3cf2889a1a1701
BLAKE2b-256 90aa97a2d452ecff37d31e08282ced72c61e57e234dd1a8ab5dbb249f6d1d33a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page