A Python library for a learning health system
Project description
medkit
CI | |
Package | |
Project |
medkit
is a toolkit for a learning health system, developed by the HeKA research team.
This python library aims at:
-
Facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data) for the extraction of relevant features.
-
Developing supervised models from these various modalities for decision support in healthcare.
Installation
To install medkit
with basic functionalities:
pip install medkit-lib
To install medkit
with all its optional features:
pip install medkit-lib[all]
Example
A basic named-entity recognition pipeline using medkit
:
# 1. Define individual operations.
from medkit.text.preprocessing import CharReplacer, LIGATURE_RULES, SIGN_RULES
from medkit.text.segmentation import SentenceTokenizer, SyntagmaTokenizer
from medkit.text.context.negation_detector import NegationDetector
from medkit.text.ner.hf_entity_matcher import HFEntityMatcher
# Preprocessing
char_replacer = CharReplacer(rules=LIGATURE_RULES + SIGN_RULES)
# Segmentation
sent_tokenizer = SentenceTokenizer(output_label="sentence")
synt_tokenizer = SyntagmaTokenizer(output_label="syntagma")
# Negation detection
neg_detector = NegationDetector(output_label="is_negated")
# Entity recognition
entity_matcher = HFEntityMatcher(model="my-BERT-model", attrs_to_copy=["is_negated"])
# 2. Combine operations into a pipeline.
from medkit.core.pipeline import Pipeline, PipelineStep
ner_pipeline = Pipeline(
input_keys=["full_text"],
output_keys=["entities"],
steps=[
PipelineStep(char_replacer, input_keys=["full_text"], output_keys=["clean_text"]),
PipelineStep(sent_tokenizer, input_keys=["clean_text"], output_keys=["sentences"]),
PipelineStep(synt_tokenizer, input_keys=["sentences"], output_keys=["syntagmas"]),
PipelineStep(neg_detector, input_keys=["syntagmas"], output_keys=[]),
PipelineStep(entity_matcher, input_keys=["syntagmas"], output_keys=["entities"]),
],
)
# 3. Run the NER pipeline on a BRAT document.
from medkit.io import BratInputConverter
docs = BratInputConverter().load(path="/path/to/dataset/")
entities = ner_pipeline.run([doc.raw_segment for doc in docs])
Getting started
To get started with medkit
, please checkout our documentation.
This documentation also contains tutorials and examples showcasing the use of medkit
for different tasks.
Contributing
Thank you for your interest into medkit !
We'll be happy to get your inputs !
If your problem has not been reported by another user, please open an issue, whether it's for:
- reporting a bug,
- discussing the current state of the code,
- submitting a fix,
- proposing new features,
- or contributing to documentation, ...
If you want to propose a pull request, you can read CONTRIBUTING.md.
Contact
Feel free to contact us by sending an email to medkit-maintainers@inria.fr.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file medkit_lib-0.13.1.tar.gz
.
File metadata
- Download URL: medkit_lib-0.13.1.tar.gz
- Upload date:
- Size: 675.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6575606a81e5be216d667f8c659577bba83b47738286c6a2dd4d4a41bd920e0 |
|
MD5 | c7412ce1c05a0e7a18dade17a3bb8e34 |
|
BLAKE2b-256 | a8482acf86caebc5da9deb77e4ea11c128fbde7e460b7e9cb4540d2126154146 |
File details
Details for the file medkit_lib-0.13.1-py3-none-any.whl
.
File metadata
- Download URL: medkit_lib-0.13.1-py3-none-any.whl
- Upload date:
- Size: 287.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2edde825e65ec82e18a4f13dfb9a0a5fa0907f57cbdb7073b2d2e6e4a40217d0 |
|
MD5 | 00c6eafc1a2dc31d85fdf45dd8a2f346 |
|
BLAKE2b-256 | 6900350502e546c1d7d67c2517967ccb042937d922bcc24b31d54d9ccf9c864b |