Skip to main content

A Python library for a learning health system

Project description

medkit

medkit logo

CI docs status lint status test: status
Package PyPI version PyPI downloads PyPI Python versions
Project License: MIT Formatter: Ruff Project: Hatch

medkit is a toolkit for a learning health system, developed by the HeKA research team.

This python library aims at:

  1. Facilitating the manipulation of healthcare data of various modalities (e.g., structured, text, audio data) for the extraction of relevant features.

  2. Developing supervised models from these various modalities for decision support in healthcare.

Installation

To install medkit with basic functionalities:

pip install medkit-lib

To install medkit with all its optional features:

pip install medkit-lib[all]

Example

A basic named-entity recognition pipeline using medkit:

# 1. Define individual operations.
from medkit.text.preprocessing import CharReplacer, LIGATURE_RULES, SIGN_RULES
from medkit.text.segmentation import SentenceTokenizer, SyntagmaTokenizer
from medkit.text.context.negation_detector import NegationDetector
from medkit.text.ner.hf_entity_matcher import HFEntityMatcher

# Preprocessing
char_replacer = CharReplacer(rules=LIGATURE_RULES + SIGN_RULES)
# Segmentation
sent_tokenizer = SentenceTokenizer(output_label="sentence")
synt_tokenizer = SyntagmaTokenizer(output_label="syntagma")
# Negation detection
neg_detector = NegationDetector(output_label="is_negated")
# Entity recognition
entity_matcher = HFEntityMatcher(model="my-BERT-model", attrs_to_copy=["is_negated"])

# 2. Combine operations into a pipeline.
from medkit.core.pipeline import Pipeline, PipelineStep

ner_pipeline = Pipeline(
    input_keys=["full_text"],
    output_keys=["entities"],
    steps=[
        PipelineStep(char_replacer, input_keys=["full_text"], output_keys=["clean_text"]),
        PipelineStep(sent_tokenizer, input_keys=["clean_text"], output_keys=["sentences"]),
        PipelineStep(synt_tokenizer, input_keys=["sentences"], output_keys=["syntagmas"]),
        PipelineStep(neg_detector, input_keys=["syntagmas"], output_keys=[]),
        PipelineStep(entity_matcher, input_keys=["syntagmas"], output_keys=["entities"]),
    ],
)

# 3. Run the NER pipeline on a BRAT document.
from medkit.io import BratInputConverter

docs = BratInputConverter().load(path="/path/to/dataset/")
entities = ner_pipeline.run([doc.raw_segment for doc in docs])

Getting started

To get started with medkit, please checkout our documentation.

This documentation also contains tutorials and examples showcasing the use of medkit for different tasks.

Contributing

Thank you for your interest into medkit !

We'll be happy to get your inputs !

If your problem has not been reported by another user, please open an issue, whether it's for:

  • reporting a bug,
  • discussing the current state of the code,
  • submitting a fix,
  • proposing new features,
  • or contributing to documentation, ...

If you want to propose a pull request, you can read CONTRIBUTING.md.

Contact

Feel free to contact us by sending an email to medkit-maintainers@inria.fr.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medkit_lib-0.13.1.tar.gz (675.7 kB view details)

Uploaded Source

Built Distribution

medkit_lib-0.13.1-py3-none-any.whl (287.5 kB view details)

Uploaded Python 3

File details

Details for the file medkit_lib-0.13.1.tar.gz.

File metadata

  • Download URL: medkit_lib-0.13.1.tar.gz
  • Upload date:
  • Size: 675.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for medkit_lib-0.13.1.tar.gz
Algorithm Hash digest
SHA256 c6575606a81e5be216d667f8c659577bba83b47738286c6a2dd4d4a41bd920e0
MD5 c7412ce1c05a0e7a18dade17a3bb8e34
BLAKE2b-256 a8482acf86caebc5da9deb77e4ea11c128fbde7e460b7e9cb4540d2126154146

See more details on using hashes here.

File details

Details for the file medkit_lib-0.13.1-py3-none-any.whl.

File metadata

  • Download URL: medkit_lib-0.13.1-py3-none-any.whl
  • Upload date:
  • Size: 287.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for medkit_lib-0.13.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2edde825e65ec82e18a4f13dfb9a0a5fa0907f57cbdb7073b2d2e6e4a40217d0
MD5 00c6eafc1a2dc31d85fdf45dd8a2f346
BLAKE2b-256 6900350502e546c1d7d67c2517967ccb042937d922bcc24b31d54d9ccf9c864b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page