Microsoft SQL Server bcp (Bulk Copy) wrapper with Azure Synapse Blob alternative
Project description
bcpyaz
What is it?
This package is a wrapper for Microsoft's SQL Server bcp utility. Current database drivers available in Python are not fast enough for transferring millions of records (yes, I have tried pyodbc fast_execute_many). Despite the IO hits, the fastest option by far is saving the data to a CSV file in file system (preferably /dev/shm tmpfs) and using the bcp utility to transfer the CSV file to SQL Server.
Azure Synapse / Blob extensions
If the following env vars are set (with examples):
AZURE_STORAGE_CONNECTION_STRING="DefaultEndpointsProtocol=https;AccountName=myblogstorageacct;AccountKey=jhadsfas235876dha90/17891hfh25125/12h2udfjs347512==;EndpointSuffix=core.windows.net"
AZURE_TEMP_STORAGE_CONTAINER="tempcontainer"
Then instead of using the bcp command line utility, the CSV file will be copied to the Azure Storage Blob temporarily and COPY will be used to transfer it from there into the Synapse Database.
You may need to
GRANT ADMINISTER DATABASE BULK OPERATIONS TO <user>
on Synapse to the user that is connecting.
The sqlcmd utility will still be required.
How Can I Install It?
Make sure your computeer has the requirements.
Install locally from a git clone:
pip install -e .
or via requirements.txt:
-e git+https://git@github.com/Arcturus-io/bcpyaz.git#egg=bcpyaz
Examples
Following examples show you how to load (1) flat files and (2) DataFrame objects to SQL Server using this package.
Flat File
Following example assumes that you have a comma separated file with no qualifier in path 'tests/data1.csv'. The code below sends the the file to SQL Server.
import bcpy
sql_config = {
'server': 'sql_server_hostname',
'database': 'database_name',
'username': 'test_user',
'password': 'test_user_password1234'
}
sql_table_name = 'test_data1'
csv_file_path = 'tests/data1.csv'
flat_file = bcpy.FlatFile(qualifier='', path=csv_file_path)
sql_table = bcpy.SqlTable(sql_config, table=sql_table_name)
flat_file.to_sql(sql_table)
DataFrame
The following example creates a DataFrame with 100 rows and 4 columns populated with random data and then it sends it to SQL Server.
import bcpy
import numpy as np
import pandas as pd
sql_config = {
'server': 'sql_server_hostname',
'database': 'database_name',
'username': 'test_user',
'password': 'test_user_password1234'
}
table_name = 'test_dataframe'
df = pd.DataFrame(np.random.randint(-100, 100, size=(100, 4)),
columns=list('ABCD'))
bdf = bcpy.DataFrame(df)
sql_table = bcpy.SqlTable(sql_config, table=table_name)
bdf.to_sql(sql_table)
Requirements
You need a working version of Microsoft bcp installed in your system. Your PATH environment variable should contain the directory of the bcp utility. Following are the installation tutorials for different operating systems.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bcpyaz-0.2.0.tar.gz
.
File metadata
- Download URL: bcpyaz-0.2.0.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc2a93d7f387e8b26162b2fe9cc5dc70d90c453db2971705fe59ce78234c3300 |
|
MD5 | cee89eb13762b02071df66f7f80b7eda |
|
BLAKE2b-256 | 2573bdf14576db83317778d1a54babcfe043714191b79f52f93da928e8912ce9 |
Provenance
File details
Details for the file bcpyaz-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: bcpyaz-0.2.0-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47c70c5bc24a40da78e0fd5ad01c7722ed2b4bc0272906e52cfab0159b56d5c5 |
|
MD5 | f07ed8628cf16f36f7ebb3aab6a72080 |
|
BLAKE2b-256 | 85f6c9f3fd0764f351c8eb6be16c8b29296efadae66fbe0019e6c4b85c44af49 |