Databricks SQLAlchemy plugin for Python

These details have not been verified by PyPI

Project links

Project description

Databricks dialect for SQLALchemy 1.0

The Databricks dialect for SQLAlchemy serves as bridge between SQLAlchemy and the Databricks SQL Python driver. A working example demonstrating usage can be found in example.py.

Installation

To install the dialect and its dependencies:

pip install databricks-sqlalchemy~=1.0

If you also plan to use alembic you can alternatively run:

pip install alembic

Connection String

Every SQLAlchemy application that connects to a database needs to use an Engine, which you can create by passing a connection string to create_engine. The connection string must include these components:

Host
HTTP Path for a compute resource
API access token
Initial catalog for the connection
Initial schema for the connection

Note: Our dialect is built and tested on workspaces with Unity Catalog enabled. Support for the hive_metastore catalog is untested.

For example:

import os
from sqlalchemy import create_engine

host = os.getenv("DATABRICKS_SERVER_HOSTNAME")
http_path = os.getenv("DATABRICKS_HTTP_PATH")
access_token = os.getenv("DATABRICKS_TOKEN")
catalog = os.getenv("DATABRICKS_CATALOG")
schema = os.getenv("DATABRICKS_SCHEMA")

if sqlalchemy.__version__.startswith("1.3"):
	# SQLAlchemy 1.3.x fails to parse the http_path, catalog, and schema from our connection string
	# Pass these in as connect_args instead

	conn_string = f"databricks://token:{access_token}@{host}"
	connect_args = dict(catalog=catalog, schema=schema, http_path=http_path)
	all_connect_args = {**extra_connect_args, **connect_args}
	engine = create_engine(conn_string, connect_args=all_connect_args)
else:
	engine = create_engine(
		f"databricks://token:{access_token}@{host}?http_path={http_path}&catalog={catalog}&schema={schema}",
		connect_args=extra_connect_args,
	)

Types

The SQLAlchemy type hierarchy contains backend-agnostic type implementations (represented in CamelCase) and backend-specific types (represented in UPPERCASE). The majority of SQLAlchemy's CamelCase types are supported. This means that a SQLAlchemy application using these types should "just work" with Databricks.

SQLAlchemy Type	Databricks SQL Type
`BigInteger`	`BIGINT`
`LargeBinary`	(not supported)
`Boolean`	`BOOLEAN`
`Date`	`DATE`
`DateTime`	`TIMESTAMP_NTZ`
`Enum`	(not supported)
`Float`	`FLOAT`
`Integer`	`INT`
`Numeric`	`DECIMAL`
`PickleType`	(not supported)
`SmallInteger`	`SMALLINT`
`String`	`STRING`
`Text`	`STRING`
`Time`	`STRING`
`Unicode`	`STRING`
`UnicodeText`	`STRING`
`Uuid`	`STRING`

`LargeBinary()` and `PickleType()`

Databricks Runtime doesn't currently support binding of binary values in SQL queries, which is a pre-requisite for this functionality in SQLAlchemy.

`Enum()` and `CHECK` constraints

Support for CHECK constraints is not implemented in this dialect. Support is planned for a future release.

SQLAlchemy's Enum() type depends on CHECK constraints and is therefore not yet supported.

`String()`, `Text()`, `Unicode()`, and `UnicodeText()`

Databricks Runtime doesn't support length limitations for STRING fields. Therefore String() or String(1) or String(255) will all produce identical DDL. Since Text(), Unicode(), UnicodeText() all use the same underlying type in Databricks SQL, they will generate equivalent DDL.

`Time()`

Databricks Runtime doesn't have a native time-like data type. To implement this type in SQLAlchemy, our dialect stores SQLAlchemy Time() values in a STRING field. Unlike DateTime above, this type can optionally support timezone awareness (since the dialect is in complete control of the strings that we write to the Delta table).

from sqlalchemy import Time

class SomeModel(Base):
    time_tz     = Time(timezone=True)
    time_ntz    = Time()

Usage Notes

`Identity()` and `autoincrement`

Identity and generated value support is currently limited in this dialect.

When defining models, SQLAlchemy types can accept an autoincrement argument. In our dialect, this argument is currently ignored. To create an auto-incrementing field in your model you can pass in an explicit Identity() instead.

Furthermore, in Databricks Runtime, only BIGINT fields can be configured to auto-increment. So in SQLAlchemy, you must use the BigInteger() type.

from sqlalchemy import Identity, String

class SomeModel(Base):
    id      = BigInteger(Identity())
    value   = String()

When calling Base.metadata.create_all(), the executed DDL will include GENERATED ALWAYS AS IDENTITY for the id column. This is useful when using SQLAlchemy to generate tables. However, as of this writing, Identity() constructs are not captured when SQLAlchemy reflects a table's metadata (support for this is planned).

Usage with pandas

Use pandas.DataFrame.to_sql and pandas.read_sql to write and read from Databricks SQL. These methods both accept a SQLAlchemy connection to interact with Databricks.

Read from Databricks SQL into pandas

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine("databricks://token:dapi***@***.cloud.databricks.com?http_path=***&catalog=main&schema=test")
with engine.connect() as conn:
    # This will read the contents of `main.test.some_table`
    df = pd.read_sql("some_table", conn)

Write to Databricks SQL from pandas

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine("databricks://token:dapi***@***.cloud.databricks.com?http_path=***&catalog=main&schema=test")
squares = [(i, i * i) for i in range(100)]
df = pd.DataFrame(data=squares,columns=['x','x_squared'])

with engine.connect() as conn:
    # This will write the contents of `df` to `main.test.squares`
    df.to_sql('squares',conn)

`PrimaryKey()` and `ForeignKey()`

Unity Catalog workspaces in Databricks support PRIMARY KEY and FOREIGN KEY constraints. Note that Databricks Runtime does not enforce the integrity of FOREIGN KEY constraints. You can establish a primary key by setting primary_key=True when defining a column.

When building ForeignKey or ForeignKeyConstraint objects, you must specify a name for the constraint.

If your model definition requires a self-referential FOREIGN KEY constraint, you must include use_alter=True when defining the relationship.

from sqlalchemy import Table, Column, ForeignKey, BigInteger, String

users = Table(
    "users",
    metadata_obj,
    Column("id", BigInteger, primary_key=True),
    Column("name", String(), nullable=False),
    Column("email", String()),
    Column("manager_id", ForeignKey("users.id", name="fk_users_manager_id_x_users_id", use_alter=True))
)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.3

Nov 14, 2024

2.0.2

Nov 13, 2024

2.0.1

Nov 5, 2024

This version

1.0.4

Nov 14, 2024

1.0.3

Nov 13, 2024

1.0.2

Nov 5, 2024

0.0.1b1 pre-release

Jan 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databricks_sqlalchemy-1.0.4.tar.gz (14.1 kB view details)

Uploaded Nov 14, 2024 Source

Built Distribution

databricks_sqlalchemy-1.0.4-py3-none-any.whl (13.4 kB view details)

Uploaded Nov 14, 2024 Python 3

File details

Details for the file databricks_sqlalchemy-1.0.4.tar.gz.

File metadata

Download URL: databricks_sqlalchemy-1.0.4.tar.gz
Upload date: Nov 14, 2024
Size: 14.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for databricks_sqlalchemy-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`c233383069b37db2d1d91a138a6acdd7657c05d14c57abbdbf77ffb6f2820b88`
MD5	`3251a500af67a57267347ba5ed48047f`
BLAKE2b-256	`62e009f424401e7477bc2c829dfa6c220dfa545d8b3f7708da30b273b76fe62a`

See more details on using hashes here.

File details

Details for the file databricks_sqlalchemy-1.0.4-py3-none-any.whl.

File metadata

Download URL: databricks_sqlalchemy-1.0.4-py3-none-any.whl
Upload date: Nov 14, 2024
Size: 13.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for databricks_sqlalchemy-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`48769f2092060784dfafc6d62dbd2e9edac81425f1186a5b9fee23ea1256c9d1`
MD5	`715851fdd4af24c8c5907747dc012807`
BLAKE2b-256	`cede48cb27fa797b2295708fb02be9edaac77a70c2aa585404dd98f870f3c802`

See more details on using hashes here.

databricks-sqlalchemy 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Databricks dialect for SQLALchemy 1.0

Installation

Connection String

Types

`LargeBinary()` and `PickleType()`

`Enum()` and `CHECK` constraints

`String()`, `Text()`, `Unicode()`, and `UnicodeText()`

`Time()`

Usage Notes

`Identity()` and `autoincrement`

Usage with pandas

Read from Databricks SQL into pandas

Write to Databricks SQL from pandas

`PrimaryKey()` and `ForeignKey()`

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

databricks-sqlalchemy 1.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Databricks dialect for SQLALchemy 1.0

Installation

Connection String

Types

LargeBinary() and PickleType()

Enum() and CHECK constraints

String(), Text(), Unicode(), and UnicodeText()

Time()

Usage Notes

Identity() and autoincrement

Usage with pandas

Read from Databricks SQL into pandas

Write to Databricks SQL from pandas

PrimaryKey() and ForeignKey()

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`LargeBinary()` and `PickleType()`

`Enum()` and `CHECK` constraints

`String()`, `Text()`, `Unicode()`, and `UnicodeText()`

`Time()`

`Identity()` and `autoincrement`

`PrimaryKey()` and `ForeignKey()`