A command-line tool (and Python library) to extract data from CommCareHQ into a SQL database or Excel workbook
Project description
CommCare Export
===============
https://github.com/dimagi/commcare-export
|Build status|
A Python library and command-line tool to generate customized exports
from the CommCareHQ REST API.
Installation & Quick Start
--------------------------
0. Install Python and ``pip``. We recommend a version beginning with
“2.7”. (This tool is tested with Python 2.6, 2.7, and 3.3)
1. Install CommCare Export via ``pip``
::
$ pip install commcare-export
Or, during development:
::
$ git clone git@github.com:dimagi/commcare-export.git
$ cd commcare-export
$ mkvirtualenv commcare-export
$ pip install -e .
Now the fastest way to try it out is follow these steps:
1. Sign up for CommCare!
2. Create a project space.
3. Go to the CommCareHq Exchange and add the “Simple CommCare
Demo/Tutorial” app to your project space.
4. Go to the app and enable CloudCare for the app and save it.
5. Go to the release manager, make a build, click the star to release
it.
6. Go to cloudcare and in the registration module fill out a few
registration forms.
7. Edit examples/demo-registrations.json and
examples/demo-registrations.xlsx to set the app\_id to your app (it
is in the URL bar when you are viewing the app)
8. Run the example Excel configuration on the command line, with your
info provided where indicated:
::
$ commcare-export \
--query examples/demo-registration.xlsx \
--domain YOUR_DOMAIN \
--output-format markdown
or, equivalently
::
$ commcare-export \
--query examples/demo-registration.json \
--domain YOUR_DOMAIN \
--output-format markdown
You’ll see the tables printed out. Change to
``--output-format sql --output URL_TO_YOUR_DB --since DATE`` to sync all
forms submitted since that date.
Command-line Usage
------------------
The basic usage of the command-line tool is with a saved Excel or JSON
query (see how to write these, below)
::
$ commcare-export --commcare-hq <URL or alias like "local" or "prod"> \
--username <username> \
--domain <domain> \
--version <api version, defaults to latest known> \
--query <excel file, json file, or raw json> \
--output-format <csv, xls, xlsx, json, markdown, sql> \
--output <file name or SQL database URL>
See ``commcare-export --help`` for the full list of options.
There are example query files for the CommCare Demo App (available on
the CommCareHq Exchange) in the ``examples/`` directory.
Excel Queries
-------------
An excel query is any ``.xlsx`` workbook. Each sheet in the workbook
represents one table you wish to create. There are two grouping of
columns to configure the table:
- **Data Source**: Set this to ``form`` to export form data, or
``case`` for case data.
- **Filter Name** / *Filter Value*: These columns are paired up to
filter the input cases or forms.
- **Field**: The destination in your SQL database for the value.
- **Source Field**: The particular field from the form you wish to
extract. This can be any JSON path.
JSON Queries
------------
JSON queries are a described in the table below. You build a JSON object
that represents the query you have in mind. A good way to get started is
to work from the examples, or you could make an excel query and run the
tool with ``--dump-query`` to see the resulting JSON query.
Python Library Usage
--------------------
As a library, the various ``commcare_export`` modules make it easy to
- Interact with the CommCareHQ REST API
- Execute “Minilinq” queries against the API (a very simple query
language, described below)
- Load and save JSON representations of Minilinq queries
- Compile Excel configurations to Minilinq queries
To directly access the CommCareHq REST API:
::
>>> import getpass
>>> from commcare_export.commcare_hq_client import CommCareHqClient
>>> api_client = CommCareHqClient('http://commcarehq.org', domain='your_domain').authenticated('your_username', getpass.getpass())
>>> forms = api_client.iterate('form', {'app_id': "whatever"})
>>> [ (form['received_on'], form['form.gender']) for form in forms ]
To issue a ``minilinq`` query against it, and then print out that query
in a JSON serialization:
::
>>> import getpass
>>> import json
>>> from commcare_export.minilinq import *
>>> from commcare_export.commcare_hq_client import CommCareHqClient
>>> from commcare_export.commcare_minilinq import CommCareHqEnv
>>> from commcare_export.env import BuiltInEnv
>>> api_client = CommCareHqClient('http://commcarehq.org', domain='your_domain').authenticated('your_username', getpass.getpass())
>>> saved_query = Map(source=Apply(Reference("api_data"), [Literal("form"), Literal({"filter": {"term": {"app_id": "whatever"}}})])
body=List([Reference("received_on"), Reference("form.gender")]))
>>> forms = saved_query.eval(BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv())
>>> print json.dumps(saved_query.to_jvalue(), indent=2)
Which will output JSON equivalent to this:
::
{
"Map": {
"source": {
"Apply": {
"fn": {"Ref": "api_data"},
"args": [
{"Lit": "form"},
{"Lit": {"filter": {"term": {"app_id": "something"}}}}
]
}
},
"body": {
"List": [
{"Ref": "received_on"},
{"Ref": "form.gender"}
]
}
}
}
MiniLinq Reference
------------------
The abstract syntax can be directly inspected in the
``commcare_export.minilinq`` module. Note that the choice between
functions and primitives is deliberately chosen to expose the structure
of the MiniLinq for possible optimization, and to restrict the overall
language.
Here is a description of the astract syntax and semantics
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Python | JSON | What is evaluates to |
+===================================+=========================================================+======================================================================================================================================================================================================================+
| ``Literal(v)`` | ``{"Lit": v}`` | Just ``v`` |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Reference(x)`` | ``{"Ref": x}`` | Whatever ``x`` resolves to in the environment |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``List([a, b, c, ...])`` | ``{"List": [a, b, c, ...}`` | The list of what ``a``, ``b``, ``c`` evaluate to |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Map(source, name, body)`` | ``{"Map": {"source": ..., "name": ..., "body": ...}`` | Evals ``body`` for each elem in ``source``. If ``name`` is provided, the elem will be bound to it, otherwise it will replace the whole env. |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``FlatMap(source, name, body)`` | ``{"FlatMap": {"source" ... etc}}`` | Flattens after mapping, like nested list comprehensions |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Filter(source, name, body)`` | etc |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Bind(value, name, body)`` | etc | Binds the result of ``value`` to ``name`` when evaluating ``body`` |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Emit(table, headings, rows)`` | etc | Emits ``table`` with ``headings`` and ``rows``. Note that ``table`` is a string, ``headings`` is a list of expressions, and ``rows`` is a list of lists of expressions. See explanation belowe for emitted output. |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Apply(fn, args)`` | etc | Evaluates ``fn`` to a function, and all of ``args``, then applies the function to the args. |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Built in functions like ``api_data`` and basic arithmetic and comparison
are provided via the environment, referred to be name using ``Ref``, and
utilized via ``Apply``
Output Formats
--------------
Your MiniLinq may define multiple tables with headings in addition to
their body rows by using ``Emit`` expressions, or may simply return the
results of a single query.
If your MiniLinq does not contain any ``Emit`` expressions, then the
results of the expression will be printed to standard output as
pretty-printed JSON.
If your MiniLinq *does* contain ``Emit`` expressions, then there are
many formats available, selected via the ``--output-format <format>``
option, and it can be directed to a file with the ``--output <file>``
command-line option.
- ``csv``: Each table will be a CSV file within a Zip archive.
- ``xls``: Each table will be a sheet in an old-format Excel
spreadsheet.
- ``xlsx``: Each table will be a sheet in a new-format Excel
spreadsheet.
- ``json``: The tables will each be a member of a JSON dictionary,
printed to standard output
- ``markdown``: The tables will be streamed to standard output in
Markdown format (very handy for debugging your queries)
- ``sql``: All data will be idempotently “upserted” into the SQL
database you specify, including creating the needed tables and
columns.
Dependencies
------------
Required dependencies will be automatically installed via pip. But since
you may not care about all export formats, the various dependencies
there are optional. Here is how you might install them:
::
# To export "xlsx"
$ pip install openpyxl
# To export "xls"
$ pip install xlwt
# To sync with a SQL database
$ pip install SQLAlchemy alembic
.. |Build status| image:: https://travis-ci.org/dimagi/commcare-export.png
:target: https://travis-ci.org/dimagi/commcare-export
===============
https://github.com/dimagi/commcare-export
|Build status|
A Python library and command-line tool to generate customized exports
from the CommCareHQ REST API.
Installation & Quick Start
--------------------------
0. Install Python and ``pip``. We recommend a version beginning with
“2.7”. (This tool is tested with Python 2.6, 2.7, and 3.3)
1. Install CommCare Export via ``pip``
::
$ pip install commcare-export
Or, during development:
::
$ git clone git@github.com:dimagi/commcare-export.git
$ cd commcare-export
$ mkvirtualenv commcare-export
$ pip install -e .
Now the fastest way to try it out is follow these steps:
1. Sign up for CommCare!
2. Create a project space.
3. Go to the CommCareHq Exchange and add the “Simple CommCare
Demo/Tutorial” app to your project space.
4. Go to the app and enable CloudCare for the app and save it.
5. Go to the release manager, make a build, click the star to release
it.
6. Go to cloudcare and in the registration module fill out a few
registration forms.
7. Edit examples/demo-registrations.json and
examples/demo-registrations.xlsx to set the app\_id to your app (it
is in the URL bar when you are viewing the app)
8. Run the example Excel configuration on the command line, with your
info provided where indicated:
::
$ commcare-export \
--query examples/demo-registration.xlsx \
--domain YOUR_DOMAIN \
--output-format markdown
or, equivalently
::
$ commcare-export \
--query examples/demo-registration.json \
--domain YOUR_DOMAIN \
--output-format markdown
You’ll see the tables printed out. Change to
``--output-format sql --output URL_TO_YOUR_DB --since DATE`` to sync all
forms submitted since that date.
Command-line Usage
------------------
The basic usage of the command-line tool is with a saved Excel or JSON
query (see how to write these, below)
::
$ commcare-export --commcare-hq <URL or alias like "local" or "prod"> \
--username <username> \
--domain <domain> \
--version <api version, defaults to latest known> \
--query <excel file, json file, or raw json> \
--output-format <csv, xls, xlsx, json, markdown, sql> \
--output <file name or SQL database URL>
See ``commcare-export --help`` for the full list of options.
There are example query files for the CommCare Demo App (available on
the CommCareHq Exchange) in the ``examples/`` directory.
Excel Queries
-------------
An excel query is any ``.xlsx`` workbook. Each sheet in the workbook
represents one table you wish to create. There are two grouping of
columns to configure the table:
- **Data Source**: Set this to ``form`` to export form data, or
``case`` for case data.
- **Filter Name** / *Filter Value*: These columns are paired up to
filter the input cases or forms.
- **Field**: The destination in your SQL database for the value.
- **Source Field**: The particular field from the form you wish to
extract. This can be any JSON path.
JSON Queries
------------
JSON queries are a described in the table below. You build a JSON object
that represents the query you have in mind. A good way to get started is
to work from the examples, or you could make an excel query and run the
tool with ``--dump-query`` to see the resulting JSON query.
Python Library Usage
--------------------
As a library, the various ``commcare_export`` modules make it easy to
- Interact with the CommCareHQ REST API
- Execute “Minilinq” queries against the API (a very simple query
language, described below)
- Load and save JSON representations of Minilinq queries
- Compile Excel configurations to Minilinq queries
To directly access the CommCareHq REST API:
::
>>> import getpass
>>> from commcare_export.commcare_hq_client import CommCareHqClient
>>> api_client = CommCareHqClient('http://commcarehq.org', domain='your_domain').authenticated('your_username', getpass.getpass())
>>> forms = api_client.iterate('form', {'app_id': "whatever"})
>>> [ (form['received_on'], form['form.gender']) for form in forms ]
To issue a ``minilinq`` query against it, and then print out that query
in a JSON serialization:
::
>>> import getpass
>>> import json
>>> from commcare_export.minilinq import *
>>> from commcare_export.commcare_hq_client import CommCareHqClient
>>> from commcare_export.commcare_minilinq import CommCareHqEnv
>>> from commcare_export.env import BuiltInEnv
>>> api_client = CommCareHqClient('http://commcarehq.org', domain='your_domain').authenticated('your_username', getpass.getpass())
>>> saved_query = Map(source=Apply(Reference("api_data"), [Literal("form"), Literal({"filter": {"term": {"app_id": "whatever"}}})])
body=List([Reference("received_on"), Reference("form.gender")]))
>>> forms = saved_query.eval(BuiltInEnv() | CommCareHqEnv(api_client) | JsonPathEnv())
>>> print json.dumps(saved_query.to_jvalue(), indent=2)
Which will output JSON equivalent to this:
::
{
"Map": {
"source": {
"Apply": {
"fn": {"Ref": "api_data"},
"args": [
{"Lit": "form"},
{"Lit": {"filter": {"term": {"app_id": "something"}}}}
]
}
},
"body": {
"List": [
{"Ref": "received_on"},
{"Ref": "form.gender"}
]
}
}
}
MiniLinq Reference
------------------
The abstract syntax can be directly inspected in the
``commcare_export.minilinq`` module. Note that the choice between
functions and primitives is deliberately chosen to expose the structure
of the MiniLinq for possible optimization, and to restrict the overall
language.
Here is a description of the astract syntax and semantics
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Python | JSON | What is evaluates to |
+===================================+=========================================================+======================================================================================================================================================================================================================+
| ``Literal(v)`` | ``{"Lit": v}`` | Just ``v`` |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Reference(x)`` | ``{"Ref": x}`` | Whatever ``x`` resolves to in the environment |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``List([a, b, c, ...])`` | ``{"List": [a, b, c, ...}`` | The list of what ``a``, ``b``, ``c`` evaluate to |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Map(source, name, body)`` | ``{"Map": {"source": ..., "name": ..., "body": ...}`` | Evals ``body`` for each elem in ``source``. If ``name`` is provided, the elem will be bound to it, otherwise it will replace the whole env. |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``FlatMap(source, name, body)`` | ``{"FlatMap": {"source" ... etc}}`` | Flattens after mapping, like nested list comprehensions |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Filter(source, name, body)`` | etc |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Bind(value, name, body)`` | etc | Binds the result of ``value`` to ``name`` when evaluating ``body`` |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Emit(table, headings, rows)`` | etc | Emits ``table`` with ``headings`` and ``rows``. Note that ``table`` is a string, ``headings`` is a list of expressions, and ``rows`` is a list of lists of expressions. See explanation belowe for emitted output. |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ``Apply(fn, args)`` | etc | Evaluates ``fn`` to a function, and all of ``args``, then applies the function to the args. |
+-----------------------------------+---------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Built in functions like ``api_data`` and basic arithmetic and comparison
are provided via the environment, referred to be name using ``Ref``, and
utilized via ``Apply``
Output Formats
--------------
Your MiniLinq may define multiple tables with headings in addition to
their body rows by using ``Emit`` expressions, or may simply return the
results of a single query.
If your MiniLinq does not contain any ``Emit`` expressions, then the
results of the expression will be printed to standard output as
pretty-printed JSON.
If your MiniLinq *does* contain ``Emit`` expressions, then there are
many formats available, selected via the ``--output-format <format>``
option, and it can be directed to a file with the ``--output <file>``
command-line option.
- ``csv``: Each table will be a CSV file within a Zip archive.
- ``xls``: Each table will be a sheet in an old-format Excel
spreadsheet.
- ``xlsx``: Each table will be a sheet in a new-format Excel
spreadsheet.
- ``json``: The tables will each be a member of a JSON dictionary,
printed to standard output
- ``markdown``: The tables will be streamed to standard output in
Markdown format (very handy for debugging your queries)
- ``sql``: All data will be idempotently “upserted” into the SQL
database you specify, including creating the needed tables and
columns.
Dependencies
------------
Required dependencies will be automatically installed via pip. But since
you may not care about all export formats, the various dependencies
there are optional. Here is how you might install them:
::
# To export "xlsx"
$ pip install openpyxl
# To export "xls"
$ pip install xlwt
# To sync with a SQL database
$ pip install SQLAlchemy alembic
.. |Build status| image:: https://travis-ci.org/dimagi/commcare-export.png
:target: https://travis-ci.org/dimagi/commcare-export
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
commcare-export-0.3.tar.gz
(20.0 kB
view details)
File details
Details for the file commcare-export-0.3.tar.gz
.
File metadata
- Download URL: commcare-export-0.3.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9353fc2e7bf9624fbc74cb6c39c3977c7e006dbc6875c3f109c70a9c56121aac |
|
MD5 | 4acbe1e0abcae12024d91e8dd6c258ff |
|
BLAKE2b-256 | e1df4ce9c98e262b9c9f0309f22a04d3292bf16c7fcd062b17bc78ed48f33718 |