CRATE: clinical records anonymisation and text extraction
Project description
# CRATE
**Clinical Records Anonymisation and Text Extraction (CRATE)**
## Purpose
- Anonymises relational databases.
- Operates a GATE natural language processing (NLP) pipeline.
- Web app for
- querying the anonymised database
- managing a consent-to-contact process
## Key directories and files
- `crate_anon/anonymise/`
- **`anonymise.py`** – core program
- `make_demo_database.py` – create a test database
- `launch_multiprocess_anonymiser.sh` – parallel processing
(multiprocess) launcher for anonymise.py
- `make_demo_database.py` – creates a demonstration database
- `test_anonymisation.py` – generates a comparison of records between
source and destination databases, to check anonymisation.
- **`crate_anon/crateweb/`** – Django web application, as above
- **`docs/`** – documentation
- `crate_anon/nlp_manager/` – NLP interface tool
- `buildjava.py` – script to compile the necessary Java source on your
machine, and create a script to test the pipeline using the ANNIE demo
GATE app.
- `CrateGatePipeline.java` – Java code to interface between
nlp_manager.py (via stdin/stdout) and the Java-based external GATE tools
(via code); must be compiled before use
- `launch_multiprocess_nlp.py` – parallel processing (multiprocess)
launcher for nlp_manager.py
- `nlp_manager.py` – core program to pipe parts of a database to a GATE
program and insert the output back into a database; uses
CrateGatePipeline.java to communicate with the NLP app
- `tools/`
- **`install_virtualenv.sh`** – creates a suitable virtualenv for CRATE
- ...
- `changelog.Debian` – Debian package changelog and general version history
- `LICENCE` – Apache license applicable to CRATE
- `README.rst` – this file
- `setup.py` – file to set up package for distribution, etc.
## Copyright/licensing
- CRATE: copyright © 2015-2016 Rudolf Cardinal (rudolf@pobox.com).
- Licensed under the Apache License, version 2.0: see LICENSE file.
- Third-party code/libraries included:
- aspects of CamAnonGatePipeline.java are based on demonstration GATE code,
copyright © University of Sheffield, and licensed under the GNU LGPL
(which license is therefore used for npl_manager/CrateGatePipeline.java;
q.v.).
**Clinical Records Anonymisation and Text Extraction (CRATE)**
## Purpose
- Anonymises relational databases.
- Operates a GATE natural language processing (NLP) pipeline.
- Web app for
- querying the anonymised database
- managing a consent-to-contact process
## Key directories and files
- `crate_anon/anonymise/`
- **`anonymise.py`** – core program
- `make_demo_database.py` – create a test database
- `launch_multiprocess_anonymiser.sh` – parallel processing
(multiprocess) launcher for anonymise.py
- `make_demo_database.py` – creates a demonstration database
- `test_anonymisation.py` – generates a comparison of records between
source and destination databases, to check anonymisation.
- **`crate_anon/crateweb/`** – Django web application, as above
- **`docs/`** – documentation
- `crate_anon/nlp_manager/` – NLP interface tool
- `buildjava.py` – script to compile the necessary Java source on your
machine, and create a script to test the pipeline using the ANNIE demo
GATE app.
- `CrateGatePipeline.java` – Java code to interface between
nlp_manager.py (via stdin/stdout) and the Java-based external GATE tools
(via code); must be compiled before use
- `launch_multiprocess_nlp.py` – parallel processing (multiprocess)
launcher for nlp_manager.py
- `nlp_manager.py` – core program to pipe parts of a database to a GATE
program and insert the output back into a database; uses
CrateGatePipeline.java to communicate with the NLP app
- `tools/`
- **`install_virtualenv.sh`** – creates a suitable virtualenv for CRATE
- ...
- `changelog.Debian` – Debian package changelog and general version history
- `LICENCE` – Apache license applicable to CRATE
- `README.rst` – this file
- `setup.py` – file to set up package for distribution, etc.
## Copyright/licensing
- CRATE: copyright © 2015-2016 Rudolf Cardinal (rudolf@pobox.com).
- Licensed under the Apache License, version 2.0: see LICENSE file.
- Third-party code/libraries included:
- aspects of CamAnonGatePipeline.java are based on demonstration GATE code,
copyright © University of Sheffield, and licensed under the GNU LGPL
(which license is therefore used for npl_manager/CrateGatePipeline.java;
q.v.).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
crate-anon-0.15.0.tar.gz
(258.8 kB
view details)
File details
Details for the file crate-anon-0.15.0.tar.gz
.
File metadata
- Download URL: crate-anon-0.15.0.tar.gz
- Upload date:
- Size: 258.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b60e44b745c9443c5a2da3081a6acf7497eb4c6238fe975935ecfdb18d1b926 |
|
MD5 | 28dd0e1b58273f42fd3178b9b90a91ae |
|
BLAKE2b-256 | 8e4e649f799875628da5b98f680d6c46924796413b6f72fec66941914fa4c9c6 |