Skip to main content

Smart text extraction from PDF documents

Project description

Tests Documentation PyPI Codecov DOI

EDS-PDF

EDS-PDF provides modular framework to extract text from PDF documents.

You can use it out-of-the-box, or extend it to fit your use-case.

Getting started

Install the library with pip:

$ pip install edspdf

Visit the documentation for more information!

Citation

If you use EDS-NLP, please cite us as below.

@software{edspdf,
  author  = {Dura, Basile and Wajsburt, Perceval and Calliger, Alice and Gérardin, Christel and Bey, Romain},
  doi     = {10.5281/zenodo.6902977},
  license = {BSD-3-Clause},
  title   = {{EDS-PDF: Smart text extraction from PDF documents}},
  url     = {https://github.com/aphp/edspdf}
}

Acknowledgement

We would like to thank Assistance Publique – Hôpitaux de Paris and AP-HP Foundation for funding this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edspdf-0.5.2.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

edspdf-0.5.2-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file edspdf-0.5.2.tar.gz.

File metadata

  • Download URL: edspdf-0.5.2.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.15 CPython/3.8.10 Linux/5.15.0-1017-azure

File hashes

Hashes for edspdf-0.5.2.tar.gz
Algorithm Hash digest
SHA256 50db3567db0745ea4d49ce1f2adf7c7daae7d21efde66f2745a6c441e0f00197
MD5 b8d9cc54ddd4acd690704e1ed6a32766
BLAKE2b-256 0d2c6b59af2e78211164b7395c8547251146780acf7fc61309aff3bb613733bb

See more details on using hashes here.

Provenance

File details

Details for the file edspdf-0.5.2-py3-none-any.whl.

File metadata

  • Download URL: edspdf-0.5.2-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.15 CPython/3.8.10 Linux/5.15.0-1017-azure

File hashes

Hashes for edspdf-0.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3a68140724dae0e55219cda9ec02f75db0b0164dc112f65ec6d61015e751535d
MD5 441f56af5d97852caad816fb39480802
BLAKE2b-256 1732145b688b443cdd9c450deb5a57925aad798a76d4332bbca5ea53edcbdb00

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page