Python extension for lance
Project description
Python bindings for Lance file format
Lance is a cloud-native columnar data format designed for managing large-scale computer vision datasets in production environments. Lance delivers blazing fast performance for image and video data use cases from analytics to point queries to training scans.
Why use Lance
You should use lance if you're a ML engineer looking to be 10x more productive when working with computer vision datasets:
- Lance saves you from having to manage multiple systems and formats for metadata, raw assets, labeling updates, and vector indices.
- Lance's custom column encoding means you don't need to choose between fast analytics and fast point queries.
- Lance has a first-class Apache Arrow integration so it's easy to create and query Lance datasets (e.g., you can directly query lance datasets using DuckDB with no extra work)
- Did we mention Lance is fast.
Try Lance
Install Lance from pip (use a venv, not conda):
pip install pylance duckdb
In python:
import lance
import duckdb
# Understand Label distribution of Oxford Pet Dataset
ds = lance.dataset("s3://eto-public/datasets/oxford_pet/pet.lance")
duckdb.query('select label, count(1) from ds group by label').to_arrow_table()
Caveat emptor
- DON'T use Conda as it prefers it's on ld path and libstd etc
- Currently only wheels are on pypi and no sdist. See below for instructions on building from source.
- Python 3.8-3.10 is supported on Linux x86_64
- Python 3.10 on MacOS (both x86_64 and Arm64) is supported
Developing Lance
Install python3, pip, and venv, and setup a virtual environment for Lance. Again, DO NOT USE CONDA (at least for now).
sudo apt install python3-pip python3-venv python3-dev
python3 -m venv ${HOME}/.venv/lance
Arrow C++ libs
Install Arrow C++ libs using instructions from Apache Arrow.
These instructions don't include Arrow's python lib so after you go through the above, don't forget to
apt install libarrow-python-dev
or yum install libarrow-python-devel
.
Build pyarrow
Assume CWD is where you want to put the repo:
source ${HOME}/.venv/lance/bin/activate
cd /path/to/lance/python/thirdparty
./build.sh
Make sure pyarrow works properly:
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
Build Lance
- Build the cpp lib. See lance/cpp/README.md for instructions.
- Build the python module in venv:
source ${HOME}/.venv/lance/bin/activate
python setup.py develop
Test the installation using the same queries in Try Lance section.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
File details
Details for the file pylance-0.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pylance-0.0.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0109cf5be2771f39db5d913f5daeca53bdaf5fa70761f70f2b9e5717e419b389 |
|
MD5 | 3476632c893bdba93edf78d449a01456 |
|
BLAKE2b-256 | dbb4f4e37470123b0fc39d1ba9c9b27d9fd7c99b4b9d2d71bb5b6d0be94a2885 |
Provenance
File details
Details for the file pylance-0.0.4-cp310-cp310-macosx_11_0_arm64.whl
.
File metadata
- Download URL: pylance-0.0.4-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 12.9 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aabd1ccdd14615e8205fbf55364da16f9e5f1f965f653130aece68eea7924ca1 |
|
MD5 | af436a2e0f716cf080bda9f1aca8d95f |
|
BLAKE2b-256 | c40af851ed15d1296d93398473e8aa97b4c262f6ae38d6e8e969c57ac17d7540 |
Provenance
File details
Details for the file pylance-0.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pylance-0.0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9322629b68a4a6db5c5926a91ce57d24e18f4f392dbb199663c44ff1ea0b5b8c |
|
MD5 | ab86427d4eff06c2edc3210042c4f192 |
|
BLAKE2b-256 | 5bdc601bce146c2fea8c0596e4ab7c0087d5150d3329006a530654655d913fa7 |
Provenance
File details
Details for the file pylance-0.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
.
File metadata
- Download URL: pylance-0.0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 2.0 MB
- Tags: CPython 3.8, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a283427d48b96bf3b1df9306fe91b77dc768dfcc5cf842ea8deb90f21810ff9d |
|
MD5 | 3ba0ec4ba87e6865bb9876f0cf35ed1b |
|
BLAKE2b-256 | d0fda80bb8ac4acfcf35ea9b07154d33d936baecc248cf9db245349c86366526 |