Building Entity Relationship Diagrams for Google BigQuery.
Project description
bigquery-erd
Entity Relationship Diagram (ERD) Generator for Google BigQuery, based upon eralchemy.
Examples
ERD for a NewsMeme database schema (taken from the original project).
Installation
pip install bigquery-erd
eralchemy
requires GraphViz to generate the graphs and Python. Both are available for Windows, Mac and Linux.
Usage
Usage from Python
Find an example notebook here.
But Wait, BigQuey is not a Relation Database?
That's right. You cannot enforce primary or foreign key constraints in BigQuery. However, that doesn't mean that you should not be able to have logical dependencies between tables.
Defining Relations through Column Descriptions
We use the column description field in BigQuery to define relations between columns in a format that we can later parse programmatically.
Let's assume we have a table a
with a column id
and another table a
with a column a_id
that serves as a foreign key relation to a.id
.
We then add the following description to b.a_id
:
-> b.id
Defining Relations to Datasets Explicitly
Per default, we assume that the related tables are located inside the same dataset. However, you can also define the datasets explicitly. This is especially useful if the two related tables are not located within the same dataset.
Let's assume that table a
is located in dataset d1
and table b
is located in d2
.
The description in b.a_id
would then be:
-> d1.a.id
Defining Cardinality Explicitly
Cardinality defines the relationship between two tables. This package understands four different cardinalities:
*
, meaning "0..N"?
, meaning "{0,1}"+
, meaning "1..N"1
, meaning "1"
Per default, we assume a cardinality of *:1
. You can also define the relation's cardinality explicitly.
Let's assume that every record in a
has at least 1 related record in b
.
the description in b.a_id
would be:
-> +:1 a.id
Example
You can find a example Google BigQuery project for the NewsMeme schema with annotated descriptions here.
Defining Custom Description RegEx
The default RegEx for relations in column descriptions is ->\s([?*+1 ]:[?*+1 ]\s)?(.*\.)?(.*)\.(.*)$
. You can define a custom RegEx by setting the GBQ_RELATION_PATTERN
environment variable.
The RegEx should match four capture groups, where:
- The first group is the cardinality (which is optional)
- The second group is the dataset id (which is optional)
- The third group is the table id
- The fourth group is the column
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file bigquery-erd-0.1.0.tar.gz
.
File metadata
- Download URL: bigquery-erd-0.1.0.tar.gz
- Upload date:
- Size: 7.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.8.0 Linux/4.15.0-1077-gcp
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f707a0e3069e2e9181830ed1ffecdc1f04726de83eb69098765c27a92e142465 |
|
MD5 | 7727b416451a5c78cc3062fc4aba4243 |
|
BLAKE2b-256 | 6c3588d6c1983f06b4104334a7f58cfcd288de3e9182951d5dfb184e38a8c13c |
File details
Details for the file bigquery_erd-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: bigquery_erd-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.4 CPython/3.8.0 Linux/4.15.0-1077-gcp
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e8df8829c173c98ef8db99419fd47fc3aa98a297d7918eb8613bb28fa6723a8 |
|
MD5 | 5a97da749cf440050cb8bca701924800 |
|
BLAKE2b-256 | 5fb0f792b84138fb1416b339a425af1244a8b7e757c0870cf2aa79736880de0e |