Provider for Apache Airflow. Implements apache-airflow-providers-databricks package
Project description
Package apache-airflow-providers-databricks
Release: 4.0.1rc1
Provider package
This is a provider package for databricks provider. All classes for this provider package are in airflow.providers.databricks python package.
You can find package information and changelog for the provider in the documentation.
Installation
You can install this package on top of an existing Airflow 2 installation (see Requirements below for the minimum Airflow version supported) via pip install apache-airflow-providers-databricks
The package supports the following python versions: 3.7,3.8,3.9,3.10
Requirements
PIP package |
Version required |
---|---|
apache-airflow |
>=2.3.0 |
apache-airflow-providers-common-sql |
>=1.3.1 |
requests |
>=2.27,<3 |
databricks-sql-connector |
>=2.0.0, <3.0.0 |
aiohttp |
>=3.6.3, <4 |
Cross provider package dependencies
Those are dependencies that might be needed in order to use all the features of the package. You need to install the specified provider packages in order to use them.
You can install such cross-provider dependencies when installing from PyPI. For example:
pip install apache-airflow-providers-databricks[common.sql]
Dependent package |
Extra |
---|---|
common.sql |
Changelog
4.0.1
Bug Fixes
DatabricksSubmitRunOperator to support taskflow (#29840)
4.0.0
Breaking changes
The DatabricksSqlHook is now conforming to the same semantics as all the other DBApiHook implementations and returns the same kind of response in its run method. Previously (pre 4.* versions of the provider, the Hook returned Tuple of (“cursor description”, “results”) which was not compatible with other DBApiHooks that return just “results”. After this change (and dependency on common.sql >= 1.3.1), The DatabricksSqlHook returns now “results” only. The description can be retrieved via last_description field of the hook after run method completes.
That makes the DatabricksSqlHook suitable for generic SQL operator and detailed lineage analysis.
If you had custom hooks or used the Hook in your TaskFlow code or custom operators that relied on this behaviour, you need to adapt your DAGs.
The Databricks DatabricksSQLOperator is also more standard and derives from common SQLExecuteQueryOperator and uses more consistent approach to process output when SQL queries are run. However in this case the result returned by execute method is unchanged (it still returns Tuple of (“description”, “results”) and this Tuple is pushed to XCom, so your DAGs relying on this behaviour should continue working without any change.
Fix errors in Databricks SQL operator introduced when refactoring (#27854)
Bump common.sql provider to 1.3.1 (#27888)
Bug Fixes
Fix templating fields and do_xcom_push in DatabricksSQLOperator (#27868)
Fixing the behaviours of SQL Hooks and Operators finally (#27912)
3.4.0
This release of provider is only available for Airflow 2.3+ as explained in the Apache Airflow providers support policy.
Misc
Move min airflow version to 2.3.0 for all providers (#27196)
Replace urlparse with urlsplit (#27389)
Features
Add SQLExecuteQueryOperator (#25717)
Use new job search API for triggering Databricks job by name (#27446)
3.3.0
Features
DatabricksSubmitRunOperator dbt task support (#25623)
Misc
Add common-sql lower bound for common-sql (#25789)
Remove duplicated connection-type within the provider (#26628)
Bug Fixes
Databricks: fix provider name in the User-Agent string (#25873)
3.2.0
Features
Databricks: update user-agent string (#25578)
More improvements in the Databricks operators (#25260)
Improved telemetry for Databricks provider (#25115)
Unify DbApiHook.run() method with the methods which override it (#23971)
Bug Fixes
Databricks: fix test_connection implementation (#25114)
Do not convert boolean values to string in deep_string_coerce function (#25394)
Correctly handle output of the failed tasks (#25427)
Databricks: Fix provider for Airflow 2.2.x (#25674)
3.1.0
Features
Added databricks_conn_id as templated field (#24945)
Add 'test_connection' method to Databricks hook (#24617)
Move all SQL classes to common-sql provider (#24836)
Bug Fixes
Update providers to use functools compat for ''cached_property'' (#24582)
3.0.0
Breaking changes
This release of provider is only available for Airflow 2.2+ as explained in the Apache Airflow providers support policy https://github.com/apache/airflow/blob/main/README.md#support-for-providers
Features
Add Deferrable Databricks operators (#19736)
Add git_source to DatabricksSubmitRunOperator (#23620)
Bug Fixes
fix: DatabricksSubmitRunOperator and DatabricksRunNowOperator cannot define .json as template_ext (#23622) (#23641)
Fix UnboundLocalError when sql is empty list in DatabricksSqlHook (#23815)
2.7.0
Features
Update to the released version of DBSQL connector
DatabricksSqlOperator - switch to databricks-sql-connector 2.x
Further improvement of Databricks Jobs operators (#23199)
2.6.0
Features
More operators for Databricks Repos (#22422)
Add a link to Databricks Job Run (#22541)
Databricks SQL operators are now Python 3.10 compatible (#22886)
Bug Fixes
Databricks: Correctly handle HTTP exception (#22885)
Misc
Refactor 'DatabricksJobRunLink' to not create ad hoc TaskInstances (#22571)
2.5.0
Features
Operator for updating Databricks Repos (#22278)
Bug Fixes
Fix mistakenly added install_requires for all providers (#22382)
2.4.0
Features
Add new options to DatabricksCopyIntoOperator (#22076)
Databricks hook - retry on HTTP Status 429 as well (#21852)
Misc
Skip some tests for Databricks from running on Python 3.10 (#22221)
2.3.0
Features
Add-showing-runtime-error-feature-to-DatabricksSubmitRunOperator (#21709)
Databricks: add support for triggering jobs by name (#21663)
Added template_ext = ('.json') to databricks operators #18925 (#21530)
Databricks SQL operators (#21363)
Bug Fixes
Fixed changelog for January 2022 (delayed) provider's release (#21439)
Misc
Support for Python 3.10
Updated Databricks docs for correct jobs 2.1 API and links (#21494)
2.2.0
Features
Add 'wait_for_termination' argument for Databricks Operators (#20536)
Update connection object to ''cached_property'' in ''DatabricksHook'' (#20526)
Remove 'host' as an instance attr in 'DatabricksHook' (#20540)
Databricks: fix verification of Managed Identity (#20550)
2.1.0
Features
Databricks: add more methods to represent run state information (#19723)
Databricks - allow Azure SP authentication on other Azure clouds (#19722)
Databricks: allow to specify PAT in Password field (#19585)
Databricks jobs 2.1 (#19544)
Update Databricks API from 2.0 to 2.1 (#19412)
Authentication with AAD tokens in Databricks provider (#19335)
Update Databricks operators to match latest version of API 2.0 (#19443)
Remove db call from DatabricksHook.__init__() (#20180)
Bug Fixes
Fixup string concatenations (#19099)
Databricks hook: fix expiration time check (#20036)
2.0.2
Bug Fixes
Move DB call out of DatabricksHook.__init__ (#18339)
2.0.1
Misc
Optimise connection importing for Airflow 2.2.0
2.0.0
Breaking changes
Auto-apply apply_default decorator (#15667)
1.0.1
Updated documentation and readme files.
1.0.0
Initial version of the provider.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for apache-airflow-providers-databricks-4.0.1rc1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e953a0ce9493f44416fa669dfe8ed3eb162315921f5eff86236ff41fe25720b8 |
|
MD5 | e51ba30abefa544d363a0132b23bd932 |
|
BLAKE2b-256 | 029508d90f3a218d3720d2a5565bbb1e075bba983d0728e84324424136444015 |
Hashes for apache_airflow_providers_databricks-4.0.1rc1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73cbc8faa91f293045ad8f63f6b8dde1bcf14343b324664401c74c0b0e55fb37 |
|
MD5 | d31beebaf8204cf6bd83e934d696b7ad |
|
BLAKE2b-256 | 0aea837033f269ddc372cdb8745c7963363c8a6114ae02ca7b4584cc34276faa |