Skip to main content

Compare MongoDB collections from the command line.

Project description

mongo-diff

mongo-diff is a command-line tool people can use to compare two MongoDB collections.

Those collections can reside in either a single database or two separate databases (even across servers).

graph LR
    script[["mongo_diff.py"]]
    result["List of<br>differences"]

    subgraph s1 \[Server]
        subgraph d1 \[Database]
            collection_a[("Collection A")]
        end
    end

    subgraph s2 \[Server]
        subgraph d2 \[Database]
            collection_b[("Collection B")]
        end
    end

    collection_a --> script
    collection_b --> script
    script --> result

Usage

1. (Optional) Create environment variables.

Part of running mongo-diff involves providing MongoDB connection strings to it. Since MongoDB connection strings sometimes contain sensitive information, I recommend storing them in environment variables instead of specifying them via CLI options to mongo-diff.

I think that will make it less likely that they are accidentally included in copy/pasted console output or in technical demonstrations.

mongo-diff is pre-programmed to look for two environment variables: MONGO_URI_A and MONGO_URI_B.

You can learn more about those environment variables in the --help snippet below.

You can create those environment variables by running the following commands (replacing the example connection strings with real ones):

$ export MONGO_URI_A='mongodb://localhost:27017'
$ export MONGO_URI_B='mongodb://username:password@host.example.com:22222'

Note: That will only create those environment variables in the current shell process. You can persist them by adding those same commands to your shell initialization script (e.g. ~/.zshrc).

2. Set up virtual environment.

# If you don't have Poetry installed yet...
$ pipx install poetry

# Create a Poetry virtual environment and attach to its shell:
$ poetry shell

# At the Poetry virtual environment's shell, install the project's production dependencies:
$ poetry install --only main

3. Use the tool.

At the Poetry virtual environment's shell, use the tool as shown in the --help snippet below.

$ python mongo_diff/mongo_diff.py --help

 Usage: mongo_diff.py [OPTIONS]

 Compare two MongoDB collections, displaying their differences on the console.
 Those collections can reside in either a single database or two separate
 databases (even across servers).

╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --include-id    --no-include-id      Includes the `_id` field when comparing │
│                                      documents.                              │
│                                      [default: no-include-id]                │
│ --help                               Show this message and exit.             │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Collection A ───────────────────────────────────────────────────────────────╮
│ *  --mongo-uri-a                    TEXT  Connection string for accessing    │
│                                           the MongoDB server containing      │
│                                           collection A.                      │
│                                           [env var: MONGO_URI_A]             │
│                                           [required]                         │
│ *  --database-name-a                TEXT  Name of the database containing    │
│                                           collection A.                      │
│                                           [required]                         │
│ *  --collection-name-a              TEXT  Name of collection A. [required]   │
│    --identifier-field-name-a        TEXT  Name of the field of each document │
│                                           in collection A to use to identify │
│                                           a corresponding document in        │
│                                           collection B.                      │
│                                           [default: id]                      │
│    --is-direct-connection-a               Sets the `directConnection` flag   │
│                                           when connecting to the MongoDB     │
│                                           server containing collection A.    │
│                                           This can be useful when connecting │
│                                           to a replica set.                  │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Collection B ───────────────────────────────────────────────────────────────╮
│ --mongo-uri-b                    TEXT  Connection string for accessing the   │
│                                        MongoDB server containing collection  │
│                                        B (if different from that specified   │
│                                        for collection A).                    │
│                                        [env var: MONGO_URI_B]                │
│ --database-name-b                TEXT  Name of the database containing       │
│                                        collection B (if different from that  │
│                                        specified for collection A).          │
│ --collection-name-b              TEXT  Name of collection B (if different    │
│                                        from that specified for collection    │
│                                        A).                                   │
│ --identifier-field-name-b        TEXT  Name of the field of each document in │
│                                        collection B to use to identify a     │
│                                        corresponding document in collection  │
│                                        A (if different from that specified   │
│                                        for collection A).                    │
│ --is-direct-connection-b               Sets the `directConnection` flag when │
│                                        connecting to the MongoDB server      │
│                                        containing collection B. Note: If the │
│                                        connection strings for both           │
│                                        collections are identical, this       │
│                                        option will be ignored.               │
╰──────────────────────────────────────────────────────────────────────────────╯

Note: The above --help snippet was captured from a terminal window whose width was 80 pixels.

Example output

As the tool compares the collections, it will display the differences it detects; like this:

Documents differ between collections: id=1,id=1. Differences: [('change', 'name', ('Joe', 'Joseph'))]
Document exists in collection A only: id=2
Document exists in collection A only: id=4
Document exists in collection B only: id=5

When the tool finishes comparing the collections, it will display a summary of the result; like this:

                         Result                         
╭───────────────────────────────────────────┬──────────╮
│ Description                               │ Quantity │
├───────────────────────────────────────────┼──────────┤
│ Documents in collection A                 │        4 │
│ Documents in collection B                 │        3 │
├───────────────────────────────────────────┼──────────┤
│ Documents in collection A only            │        2 │
│ Documents in collection B only            │        1 │
├───────────────────────────────────────────┼──────────┤
│ Documents that differ between collections │        1 │
╰───────────────────────────────────────────┴──────────╯

Development

We use Poetry to both (a) manage dependencies and (b) publish packages to PyPI.

  • pyproject.toml: Configuration file for Poetry and other tools (was generated via $ poetry init)
  • poetry.lock: List of dependencies, direct and indirect (was generated via $ poetry update)

Create virtual environment

Create a Poetry virtual environment and attach to its shell:

poetry shell

You can see information about the Poetry virtual environment by running: $ poetry env info

You can detach from the Poetry virtual environment's shell by running: $ exit

From now on, I'll refer to the Poetry virtual environment's shell as the "Poetry shell."

Install dependencies

At the Poetry shell, install the project's dependencies:

poetry install

Make changes

Edit the tool's source code and documentation however you want.

Build package

At the Poetry shell, build the package based upon the latest source code:

poetry build

That will create both a source distribution file (whose name ends with .tar.gz) and a wheel file (whose name ends with .whl) in the dist directory.

Publish package

At the Poetry shell, configure Poetry to use your PyPI credentials, if you haven't already done so.

poetry config http-basic.pypi {your_PyPI_username}

Poetry will prompt you for your PyPI password.

I use the username of "__token__" and a password that is an adequately-scoped token created via the PyPI website.

At the Poetry shell, publish the newly-built package to PyPI:

poetry publish

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mongo_diff-0.1.0.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

mongo_diff-0.1.0-py3-none-any.whl (6.9 kB view details)

Uploaded Python 3

File details

Details for the file mongo_diff-0.1.0.tar.gz.

File metadata

  • Download URL: mongo_diff-0.1.0.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.12 Darwin/23.2.0

File hashes

Hashes for mongo_diff-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ae340f80f809d05c2a0338cc9e5dbd0e80959774c47fa7c1e7b7cebe0bb06428
MD5 7c6d8a3e6358a9fb0a58905c87be288a
BLAKE2b-256 019a0fdf5a54b3f4917d38101d27da9afe7c51b2a9c69a8eb2b39f7f12e4e7b1

See more details on using hashes here.

File details

Details for the file mongo_diff-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mongo_diff-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.10.12 Darwin/23.2.0

File hashes

Hashes for mongo_diff-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 25aab1008cfad720e9289b3dd0e6ed5a6e121aa1784838e186eae96648762054
MD5 afee79ec17f91c2fc87fcf597365300c
BLAKE2b-256 433700a1dab64a8ebe3d8f580ea0826bc8603ea5f0f7354064e08528d1628f2a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page