Skip to main content

A python package for Substrait.

Project description

Substrait

PyPI version conda-forge version

A Python package for Substrait, the cross-language specification for data compute operations.

Installation

You can install the Python substrait bindings from PyPI or conda-forge

pip install substrait
conda install -c conda-forge python-substrait  # or use mamba

Goals

This project aims to provide a Python interface for the Substrait specification. It will allow users to construct and manipulate a Substrait Plan from Python for evaluation by a Substrait consumer, such as DataFusion or DuckDB.

Non-goals

This project is not an execution engine for Substrait Plans.

Status

This is an experimental package that is still under development.

Example

Produce a Substrait Plan

The substrait.proto module provides access to the classes that represent a substrait Plan, thus allowing to create new plans.

Here is an example plan equivalent to SELECT first_name FROM person where people table has first_name and surname columns of type String

>>> from substrait import proto
>>> plan = proto.Plan(
...   relations=[
...     proto.PlanRel(
...       root=proto.RelRoot(
...         names=["first_name"], 
...         input=proto.Rel(
...           read=proto.ReadRel(
...             named_table=proto.ReadRel.NamedTable(names=["people"]),
...             base_schema=proto.NamedStruct(
...               names=["first_name", "surname"], 
...               struct=proto.Type.Struct(
...                 types=[
...                   proto.Type(string=proto.Type.String(nullability=proto.Type.Nullability.NULLABILITY_REQUIRED)), 
...                   proto.Type(string=proto.Type.String(nullability=proto.Type.Nullability.NULLABILITY_REQUIRED))
...                 ]  # /types
...               )  # /struct
...             )  # /base_schema
...           )  # /read
...         )  # /input
...       )  # /root
...     )  # /PlanRel
...   ]  # /relations
... )
>>> print(plan)
relations {
  root {
    input {
      read {
        base_schema {
          names: "first_name"
          names: "surname"
          struct {
            types {
              string {
                nullability: NULLABILITY_REQUIRED
              }
            }
            types {
              string {
                nullability: NULLABILITY_REQUIRED
              }
            }
          }
        }
        named_table {
          names: "people"
        }
      }
    }
    names: "first_name"
  }
}
>>> serialized_plan = p.SerializeToString()
>>> serialized_plan
b'\x1aA\x12?\n1\n/\x12#\n\nfirst_name\n\x07surname\x12\x0c\n\x04b\x02\x10\x02\n\x04b\x02\x10\x02:\x08\n\x06people\x12\nfirst_name'

Consume the Substrait Plan

The same plan we generated in the previous example, can be loaded back from its binary representation using the Plan.ParseFromString method:

>>> from substrait.proto import Plan
>>> p = Plan()
>>> p.ParseFromString(serialized_plan)
67
>>> p
relations {
  root {
    input {
      read {
        base_schema {
          names: "first_name"
          names: "surname"
          struct {
            types {
              string {
                nullability: NULLABILITY_REQUIRED
              }
            }
            types {
              string {
                nullability: NULLABILITY_REQUIRED
              }
            }
          }
        }
        named_table {
          names: "people"
        }
      }
    }
    names: "first_name"
  }
}

Produce a Substrait Plan with Ibis

Let's use an existing Substrait producer, Ibis, to provide an example using Python Substrait as the consumer.

In [1]: import ibis

In [2]: movie_ratings = ibis.table(
   ...:     [
   ...:         ("tconst", "str"),
   ...:         ("averageRating", "str"),
   ...:         ("numVotes", "str"),
   ...:     ],
   ...:     name="ratings",
   ...: )
   ...:

In [3]: query = movie_ratings.select(
   ...:     movie_ratings.tconst,
   ...:     avg_rating=movie_ratings.averageRating.cast("float"),
   ...:     num_votes=movie_ratings.numVotes.cast("int"),
   ...: )

In [4]: from ibis_substrait.compiler.core import SubstraitCompiler

In [5]: compiler = SubstraitCompiler()

In [6]: protobuf_msg = compiler.compile(query).SerializeToString()

In [7]: from substrait.proto import Plan

In [8]: my_plan = Plan()

In [9]: my_plan.ParseFromString(protobuf_msg)
Out[9]: 186

In [10]: print(my_plan)
relations {
  root {
    input {
      project {
        common {
          emit {
            output_mapping: 3
            output_mapping: 4
            output_mapping: 5
          }
        }
        input {
          read {
            common {
              direct {
              }
            }
            base_schema {
              names: "tconst"
              names: "averageRating"
              names: "numVotes"
              struct {
                types {
                  string {
                    nullability: NULLABILITY_NULLABLE
                  }
                }
                types {
                  string {
                    nullability: NULLABILITY_NULLABLE
                  }
                }
                types {
                  string {
                    nullability: NULLABILITY_NULLABLE
                  }
                }
                nullability: NULLABILITY_REQUIRED
              }
            }
            named_table {
              names: "ratings"
            }
          }
        }
        expressions {
          selection {
            direct_reference {
              struct_field {
              }
            }
            root_reference {
            }
          }
        }
        expressions {
          cast {
            type {
              fp64 {
                nullability: NULLABILITY_NULLABLE
              }
            }
            input {
              selection {
                direct_reference {
                  struct_field {
                    field: 1
                  }
                }
                root_reference {
                }
              }
            }
            failure_behavior: FAILURE_BEHAVIOR_THROW_EXCEPTION
          }
        }
        expressions {
          cast {
            type {
              i64 {
                nullability: NULLABILITY_NULLABLE
              }
            }
            input {
              selection {
                direct_reference {
                  struct_field {
                    field: 2
                  }
                }
                root_reference {
                }
              }
            }
            failure_behavior: FAILURE_BEHAVIOR_THROW_EXCEPTION
          }
        }
      }
    }
    names: "tconst"
    names: "avg_rating"
    names: "num_votes"
  }
}
version {
  minor_number: 24
  producer: "ibis-substrait"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

substrait-0.15.0.tar.gz (51.0 kB view details)

Uploaded Source

Built Distribution

substrait-0.15.0-py3-none-any.whl (55.3 kB view details)

Uploaded Python 3

File details

Details for the file substrait-0.15.0.tar.gz.

File metadata

  • Download URL: substrait-0.15.0.tar.gz
  • Upload date:
  • Size: 51.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for substrait-0.15.0.tar.gz
Algorithm Hash digest
SHA256 caaf7b613db86234aa09879c1e2839e2eedd63f354936632f160efbdcb0ee00e
MD5 aeeca90a1c23c317d6b647f1051c9042
BLAKE2b-256 1eaa7cd12f05ae3f2de78808ca8e75f6418fd31c12fd540a355e65dceba0c2f5

See more details on using hashes here.

File details

Details for the file substrait-0.15.0-py3-none-any.whl.

File metadata

  • Download URL: substrait-0.15.0-py3-none-any.whl
  • Upload date:
  • Size: 55.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for substrait-0.15.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c48cf29c0175d458013da80f44c963254c1d1985efcc2aed925eb715c91464b6
MD5 6396f7be674606e479263bc5d3842013
BLAKE2b-256 67390ab36cc129ee25a33b1a7f397ea2adcd2fe59f610030433c588f3525a9bb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page