Skip to main content

PostgreSQL selective data dump

Project description

This tool allows to extract data from a PostgreSQL database with greater flexibility that tools like pg_dump allow.

pg_seldump reads one of more dump definitions from YAML files and selects what tables or other database objects to save. It is possible to extract only certain columns of the tables, only certain records, or to replace certain values with a different expression, for instance to anonymize data.

The output of the program is a text file which can be used by psql to restore data into a database with a complete schema but with no data (or at least no conflicting data), e.g. using:

$ pg_seldump --dsn="dbname=sourcedb" datadump.yaml > dump.sql
...
$ psql -1X --set ON_ERROR_STOP=1 -f dump.sql "dbname=targetdb"

Program usage

Usage:

pg_seldump [-h] [--version] [--dsn DSN] [--outfile OUTFILE] [--test]
           [-q | -v]
           config [config ...]

Create a selective dump of a PostgreSQL database.

positional arguments:
  config                yaml file describing the data to dump

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --dsn DSN             database connection string [default: '']
  --outfile OUTFILE, -o OUTFILE
                        the file where to save the dump [default: stdout]
  --test                test the configuration to verify it works as expected
  -q, --quiet           talk less
  -v, --verbose         talk more

The config files must be YAML files containing a db_objects list of entries. Each entry may have:

Selectors (all the specified ones must match):

  • name: name of the db object to dump

  • names: list of names or regex of db objects to dump

  • schema: schema name of the db object to dump

  • schemas: list of schema names or regexp to match schema names of the db object to dump

  • kind: kind of object to match (table, sequence, a few others)

  • kinds: list of kind of objects to match (table, sequence, a few others)

  • adjust_score: adjustment for the match score to break rules ties

Data modifiers:

  • action: what to do with the matched object:

    • dump: dump the object in the output (default)

    • skip: don’t dump the object

    • error: raise an error in case of match (useful to create strict description where all the db objects must be mentioned explicitly)

  • no_columns: list of columns names to omit

  • filter: WHERE condition to include only a subset of the records in the dump

  • replace: mapping from column names to SQL expressions to replace values into the dump with somethings else

The objects in the database are matched to the rules in the config files. Every match will have a score according to how specific was the selector matched the object.

  • name or names list: 1000

  • names regexp: 500

  • schema or schemas list: 100

  • schemas regexp: 50

  • kind or kinds: 10

The rule with the highest score will apply. If two rules have exactly the same score the program will report an error: you can use adjust_score to break the tie.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pg_seldump-0.1.tar.gz (12.8 kB view details)

Uploaded Source

File details

Details for the file pg_seldump-0.1.tar.gz.

File metadata

  • Download URL: pg_seldump-0.1.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.6.9

File hashes

Hashes for pg_seldump-0.1.tar.gz
Algorithm Hash digest
SHA256 553e7136fa97ed635f109bde946b0d2ecb798a206c27577c9efb7c2fb63667e1
MD5 7ceaf5e81295b2aa393f33d265c477b0
BLAKE2b-256 cb26cdfd8f4210d1085edc547b88e3d7dfd570dc2c298f805064189f268bbe55

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page