A unit test framework that allows you to write unit and functional tests for Data Factory pipelines against the git integrated json resource files.

These details have not been verified by PyPI

Project description

Data Factory - Testing Framework

A test framework that allows you to write unit and functional tests for Data Factory pipelines against the git integrated json resource files.

Supporting currently:

Planned:

Azure Synapse Analytics

Disclaimer

This unit test framework is not officially supported. It is currently in experimental state and has not been tested with every single data factory resource. It should support all activities out-of-the-box, but has not been thoroughly tested, please report any issues in the issues section and include an example of the pipeline that is not working as expected.

If there's a lot of interest in this framework, then we will continue to improve it and move it to a production ready state.

Features

Goal: Validate that the evaluated pipeline configuration with its expressions are behaving as expected on runtime.

Evaluate expressions with their functions and arguments instantly by using the framework's internal expression parser.
Test a pipeline or activity against any state to assert expected outcome. State can be configured with pipeline parameters, global parameters, variables and activity outputs.
Simulate a pipeline run and evaluate the execution flow and outcome of each activity.
Dynamically supports all activity types with all their attributes.

Pipelines and activities are not executed on any Data Factory environment, but the evaluation of the pipeline configuration is validated locally. This is different from the "validation" functionality present in the UI, which only validates the syntax of the pipeline configuration.

Why

Data Factory does not support unit testing out of the box. The only way to validate your changes is through manual testing or running e2e tests against a deployed data factory. These tests are great to have, but miss the following benefits that unit tests, like using this unit test framework, provides:

Shift left with immediate feedback on changes - Evaluate any individual data factory resource (pipelines, activities, triggers, datasets, linkedServices etc..), including (complex) expressions
Allows testing individual resources (e.g. activity) for many different input values to cover more scenarios.
Less issues in production - due to the fast nature of writing and running unit tests, you will write more tests in less time and therefore have a higher test coverage. This means more confidence in new changes, less risks in breaking existing features (regression tests) and thus far less issues in production.

Even though Data Factory is UI-driven and writing unit tests might not be in the nature of it. How can you be confident that your changes will work as expected, and existing pipelines will not break, without writing unit tests?

Getting started

Set up an empty Python project with your favorite testing library
Install the package using your preferred package manager:
- Pip: pip install data-factory-testing-framework
- Poetry: poetry add data-factory-testing-framework
Start writing tests

Features - Examples

The samples seen below is the only code that you need to write! The framework will take care of the rest.

Evaluate activities (e.g. a WebActivity that calls Azure Batch API)

# Arrange
activity: Activity = pipeline.get_activity_by_name("Trigger Azure Batch Job")
state = PipelineRunState(
    parameters=[
        RunParameter(RunParameterType.Global, "BaseUrl", "https://example.com"),
        RunParameter(RunParameterType.Pipeline, "JobId", "123"),
    ],
    variables=[
        PipelineRunVariable("JobName", "Job-123"),
    ])
state.add_activity_result("Get version", DependencyCondition.SUCCEEDED, {"Version": "version1"})

# Act
activity.evaluate(state)

# Assert
assert "https://example.com/jobs" == activity.type_properties["url"].value
assert "POST" == activity.type_properties["method"].value
body = activity.type_properties["body"].get_json_value()
assert "123" == body["JobId"]
assert "Job-123" == body["JobName"]
assert "version1" == body["Version"]

Evaluate Pipelines and test the flow of activities given a specific input

# Arrange
pipeline: PipelineResource = test_framework.repository.get_pipeline_by_name("batch_job")

# Runs the pipeline with the provided parameters
activities = test_framework.evaluate_pipeline(pipeline, [
    RunParameter(RunParameterType.Pipeline, "JobId", "123"),
    RunParameter(RunParameterType.Pipeline, "ContainerName", "test-container"),
    RunParameter(RunParameterType.Global, "BaseUrl", "https://example.com"),
])

set_variable_activity: Activity = next(activities)
assert set_variable_activity is not None
assert "Set JobName" == set_variable_activity.name
assert "JobName" == activity.type_properties["variableName"]
assert "Job-123" == activity.type_properties["value"].value

get_version_activity = next(activities)
assert get_version_activity is not None
assert "Get version" == get_version_activity.name
assert "https://example.com/version" == get_version_activity.type_properties["url"].value
assert "GET" == get_version_activity.type_properties["method"]
get_version_activity.set_result(DependencyCondition.Succeeded,{"Version": "version1"})

create_batch_activity = next(activities)
assert create_batch_activity is not None
assert "Trigger Azure Batch Job" == create_batch_activity.name
assert "https://example.com/jobs" == create_batch_activity.type_properties["url"].value
assert "POST" == create_batch_activity.type_properties["method"]
body = create_batch_activity.type_properties["body"].get_json_value()
assert "123" == body["JobId"]
assert "Job-123" == body["JobName"]
assert "version1" == body["Version"]

with pytest.raises(StopIteration):
    next(activities)

See Examples folder for more samples

Registering missing expression functions

As the framework is interpreting expressions containing functions, these functions are implemented in the framework, but there may be bugs in some of them. You can override their implementation through:

   FunctionsRepository.register("concat", lambda arguments: "".join(arguments))
   FunctionsRepository.register("trim", lambda text, trim_argument: text.strip(trim_argument[0]))

Tips

After parsing a data factory resource file, you can use the debugger to easily discover which classes are actually initialized so that you can cast them to the correct type.

Recommended development workflow for Azure Data Factory v2

Use ADF Git integration
Use UI to create feature branch, build initial pipeline and save to feature branch
Pull feature branch locally
Start writing tests unit and functional tests, run them locally for immediate feedback and fix bugs
Push changes to feature branch
Test the new features manually through the UI in sandbox environment
Create PR, which will run the tests in the CI pipeline
Approve PR
Merge to main and start deploying to dev/test/prd environments
Run e2e tests after each deployment to validate all happy flows work on that specific environment

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.1.5

Oct 11, 2024

1.1.4

Oct 8, 2024

1.1.3

Oct 5, 2024

1.1.2

Oct 4, 2024

1.1.1

Sep 5, 2024

1.1.0

Aug 30, 2024

1.0.6

Aug 20, 2024

1.0.5

Aug 19, 2024

1.0.4

Aug 16, 2024

1.0.3

Aug 9, 2024

1.0.2

Aug 9, 2024

1.0.1

Aug 5, 2024

1.0.0

Aug 5, 2024

0.2.17

Jul 16, 2024

0.2.16

Jul 8, 2024

0.2.15

Jul 1, 2024

0.2.14

Jul 1, 2024

0.2.13

Jul 1, 2024

0.2.12

Jun 13, 2024

0.2.11

Jun 7, 2024

0.2.10

Jun 6, 2024

0.2.9

May 22, 2024

0.2.8

May 1, 2024

0.2.7

May 1, 2024

0.2.6

Apr 17, 2024

0.2.5

Apr 16, 2024

0.2.4

Apr 16, 2024

0.2.3

Apr 9, 2024

0.2.2

Apr 5, 2024

0.2.1

Mar 31, 2024

0.1.0a202 pre-release

Mar 27, 2024

0.1.0a188 pre-release

Mar 19, 2024

0.1.0a187 pre-release

Mar 13, 2024

0.1.0a180 pre-release

Mar 6, 2024

0.1.0a171 pre-release

Feb 26, 2024

0.0.1a166 pre-release

Feb 22, 2024

0.0.1a158 pre-release

Feb 13, 2024

0.0.1a156 pre-release

Feb 9, 2024

0.0.1a116 pre-release

Feb 9, 2024

0.0.1a109 pre-release

Feb 8, 2024

0.0.1a108 pre-release

Feb 8, 2024

0.0.1a95 pre-release

Feb 5, 2024

0.0.1a94 pre-release

Feb 5, 2024

0.0.1a88 pre-release

Feb 2, 2024

0.0.1a83 pre-release

Jan 30, 2024

0.0.1a77 pre-release

Jan 30, 2024

0.0.1a75 pre-release

Jan 25, 2024

0.0.1a72 pre-release

Jan 22, 2024

0.0.1a65 pre-release

Jan 17, 2024

0.0.1a56 pre-release

Jan 5, 2024

0.0.1a51 pre-release

Jan 3, 2024

0.0.1a46 pre-release

Dec 29, 2023

0.0.0a45 pre-release

Dec 29, 2023

0.0.0a35 pre-release

Dec 22, 2023

This version

0.0.0a18 pre-release

Dec 20, 2023

0.0.0a16 pre-release

Dec 18, 2023

0.0.0a11 pre-release

Dec 8, 2023

0.0.0a5 pre-release

Dec 8, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_factory_testing_framework-0.0.0a18.tar.gz (31.9 kB view details)

Uploaded Dec 20, 2023 Source

Built Distribution

data_factory_testing_framework-0.0.0a18-py3-none-any.whl (56.9 kB view details)

Uploaded Dec 20, 2023 Python 3

File details

Details for the file data_factory_testing_framework-0.0.0a18.tar.gz.

File metadata

Download URL: data_factory_testing_framework-0.0.0a18.tar.gz
Upload date: Dec 20, 2023
Size: 31.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for data_factory_testing_framework-0.0.0a18.tar.gz
Algorithm	Hash digest
SHA256	`86a626aff90f1daea2cb17dbb3d679a8e1fc3f618e4d11ce938fdf24afa90a90`
MD5	`8c10fcbf93114170e7f3e4cb9b0f5c5c`
BLAKE2b-256	`259ba287bd084b6254a1ac89948bc4b8413dad7e6b7495b5087f6820072d8b93`

See more details on using hashes here.

File details

Details for the file data_factory_testing_framework-0.0.0a18-py3-none-any.whl.

File metadata

Download URL: data_factory_testing_framework-0.0.0a18-py3-none-any.whl
Upload date: Dec 20, 2023
Size: 56.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/4.0.2 CPython/3.11.7

File hashes

Hashes for data_factory_testing_framework-0.0.0a18-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e6eb28aac7533e95a6c3460001bf7fa260c48b4b0e1bce2f1d66e1ec3987dd07`
MD5	`74c2b54de70e42b0c3520b062ad412a5`
BLAKE2b-256	`c7594330ff142bea7a290a1f4ebbe571624e3142a38ac6bdcd607a5ea9d12dcc`