Compare MongoDB collections from the command line.
Project description
mongo-diff
mongo-diff
is a command-line tool people can use to compare two MongoDB collections.
Those collections can reside in either a single database or two separate databases (even across servers).
graph LR
script[["mongo_diff.py"]]
result["List of<br>differences"]
subgraph s1 \[Server]
subgraph d1 \[Database]
collection_a[("Collection A")]
end
end
subgraph s2 \[Server]
subgraph d2 \[Database]
collection_b[("Collection B")]
end
end
collection_a --> script
collection_b --> script
script --> result
Usage
1. (Optional) Create environment variables.
Part of running mongo-diff
involves providing MongoDB connection strings to it. Since MongoDB connection strings
sometimes contain sensitive information, I recommend storing them in environment variables instead of specifying
them via CLI options to mongo-diff
.
I think that will make it less likely that they are accidentally included in copy/pasted console output or in technical demonstrations.
mongo-diff
is pre-programmed to look for two environment variables: MONGO_URI_A
and MONGO_URI_B
.
You can learn more about those environment variables in the
--help
snippet below.
You can create those environment variables by running the following commands (replacing the example connection strings with real ones):
$ export MONGO_URI_A='mongodb://localhost:27017'
$ export MONGO_URI_B='mongodb://username:password@host.example.com:22222'
Note: That will only create those environment variables in the current shell process. You can persist them by adding those same commands to your shell initialization script (e.g.
~/.zshrc
).
2. Set up virtual environment.
# If you don't have Poetry installed yet...
$ pipx install poetry
# Create a Poetry virtual environment and attach to its shell:
$ poetry shell
# At the Poetry virtual environment's shell, install the project's production dependencies:
$ poetry install --only main
3. Use the tool.
At the Poetry virtual environment's shell, use the tool as shown in the --help
snippet below.
$ python mongo_diff/mongo_diff.py --help
Usage: mongo_diff.py [OPTIONS]
Compare two MongoDB collections, displaying their differences on the console.
Those collections can reside in either a single database or two separate
databases (even across servers).
╭─ Options ────────────────────────────────────────────────────────────────────╮
│ --include-id --no-include-id Includes the `_id` field when comparing │
│ documents. │
│ [default: no-include-id] │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Collection A ───────────────────────────────────────────────────────────────╮
│ * --mongo-uri-a TEXT Connection string for accessing │
│ the MongoDB server containing │
│ collection A. │
│ [env var: MONGO_URI_A] │
│ [required] │
│ * --database-name-a TEXT Name of the database containing │
│ collection A. │
│ [required] │
│ * --collection-name-a TEXT Name of collection A. [required] │
│ --identifier-field-name-a TEXT Name of the field of each document │
│ in collection A to use to identify │
│ a corresponding document in │
│ collection B. │
│ [default: id] │
│ --is-direct-connection-a Sets the `directConnection` flag │
│ when connecting to the MongoDB │
│ server containing collection A. │
│ This can be useful when connecting │
│ to a replica set. │
╰──────────────────────────────────────────────────────────────────────────────╯
╭─ Collection B ───────────────────────────────────────────────────────────────╮
│ --mongo-uri-b TEXT Connection string for accessing the │
│ MongoDB server containing collection │
│ B (if different from that specified │
│ for collection A). │
│ [env var: MONGO_URI_B] │
│ --database-name-b TEXT Name of the database containing │
│ collection B (if different from that │
│ specified for collection A). │
│ --collection-name-b TEXT Name of collection B (if different │
│ from that specified for collection │
│ A). │
│ --identifier-field-name-b TEXT Name of the field of each document in │
│ collection B to use to identify a │
│ corresponding document in collection │
│ A (if different from that specified │
│ for collection A). │
│ --is-direct-connection-b Sets the `directConnection` flag when │
│ connecting to the MongoDB server │
│ containing collection B. Note: If the │
│ connection strings for both │
│ collections are identical, this │
│ option will be ignored. │
╰──────────────────────────────────────────────────────────────────────────────╯
Note: The above
--help
snippet was captured from a terminal window whose width was 80 pixels.
Example output
As the tool compares the collections, it will display the differences it detects; like this:
Documents differ between collections: id=1,id=1. Differences: [('change', 'name', ('Joe', 'Joseph'))]
Document exists in collection A only: id=2
Document exists in collection A only: id=4
Document exists in collection B only: id=5
When the tool finishes comparing the collections, it will display a summary of the result; like this:
Result
╭───────────────────────────────────────────┬──────────╮
│ Description │ Quantity │
├───────────────────────────────────────────┼──────────┤
│ Documents in collection A │ 4 │
│ Documents in collection B │ 3 │
├───────────────────────────────────────────┼──────────┤
│ Documents in collection A only │ 2 │
│ Documents in collection B only │ 1 │
├───────────────────────────────────────────┼──────────┤
│ Documents that differ between collections │ 1 │
╰───────────────────────────────────────────┴──────────╯
Development
We use Poetry to both (a) manage dependencies and (b) publish packages to PyPI.
pyproject.toml
: Configuration file for Poetry and other tools (was generated via$ poetry init
)poetry.lock
: List of dependencies, direct and indirect (was generated via$ poetry update
)
Create virtual environment
Create a Poetry virtual environment and attach to its shell:
poetry shell
You can see information about the Poetry virtual environment by running:
$ poetry env info
You can detach from the Poetry virtual environment's shell by running:
$ exit
From now on, I'll refer to the Poetry virtual environment's shell as the "Poetry shell."
Install dependencies
At the Poetry shell, install the project's dependencies:
poetry install
Make changes
Edit the tool's source code and documentation however you want.
Build package
Update package version
PyPI doesn't allow people to publish the same "version" of a package multiple times.
You can update the version identifier of the package by running:
poetry version {version_or_keyword}
You can replace
{version_or_keyword}
with either a literal version identifier (e.g.0.1.1
) or a keyword (e.g.major
,minor
, orpatch
). You can run$ poetry version --help
to see the valid keywords.
Alternatively, you can manually edit a line in pyproject.toml
:
- version = "0.1.0"
+ version = "0.1.1"
Build package
At the Poetry shell, build the package based upon the latest source code:
poetry build
That will create both a source distribution file (whose name ends with
.tar.gz
) and a wheel file (whose name ends with.whl
) in thedist
directory.
Publish package
Set up PyPI credentials
At the Poetry shell, create the following environment variable, which Poetry will check for if credentials aren't specified to it in another way.
export POETRY_PYPI_TOKEN_PYPI="{api_token}"
Replace
{api_token}
with a PyPI API token whose scope includes the PyPI project to which you want to publish the package.
Publish package to PyPI
At the Poetry shell, publish the newly-built package to PyPI:
poetry publish
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mongo_diff-0.1.2.tar.gz
.
File metadata
- Download URL: mongo_diff-0.1.2.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.12 Darwin/23.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 27675bdc0498a414256ab3232c4bfe227c6379d69250a13becdc44cae2b192c6 |
|
MD5 | 3ed414278883383edeb91db6149370a9 |
|
BLAKE2b-256 | d03fc7daab80da14789b3508607651cf63bf5a53293b19fb0e229b97d8fc7a50 |
File details
Details for the file mongo_diff-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: mongo_diff-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.10.12 Darwin/23.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2cf1e663fdfb0df9b5fe637c67cfa9d273c1b0387b206f7b257ffcb3c69f1d19 |
|
MD5 | 3fa1f7db0f75451f59414db1fdf10f38 |
|
BLAKE2b-256 | cd32a8419eb7bccea430ab809468c495da8ae07c23559260c1eca20b026cba56 |