Skip to main content

OS-Climate Data Extraction Tool

Project description

An OS-Climate Project Join OS-Climate on Slack Source code on GitHub PyPI package Built Status Built using PDM Project generated with PyScaffold

OS-Climate Data Extraction Tool

This code provides you with an api and a streamlit app to which you can provide a pdf document and the output will be the text content in a json format. In the backend it is using a python module for extracting text from pdfs, which might be extended in the future to other file types. The json file is needed for later usage in the context of transformer models to extract relevant information, but it can also be used independently.

Quick start

Install via PyPi

You can simply install the package via:

$ pip install osc-transformer-presteps

Afterwards you can use the tooling as a CLI tool by simply typing:

$ osc-transformer-presteps

We are using typer to have a nice CLI tool here. All details and help will be shown in the CLI tool itself and are not described here in more detail.

Install via Github Repository

For a quick start with the tool install python and clone the repository to your local environment:

$ git clone https://github.com/os-climate/osc-transformer-presteps

Afterwards update your python to the requirements (possible for example via pdm update) and start a local api server via:

$ python ./src/run_server.py
Note:
  • We assume that you are located in the cloned repository.

  • To check if it is running open “http://localhost:8000/liveness” and you should see the message {“message”: “OSC Transformer Pre-Steps Server is running.”}.

Finally, run the following code to start a streamlit app which opens up the possibility to “upload” a file and extract data from pdf to json via this UI. Note that the UI needs the running server so you have to open the streamlit and the server in two different terminals.:

$ streamlit run ./src/osc_transformer_presteps/streamlit/app.py

Note: Check also docs/demo. There you can find local_extraction_demo.py which will start an extraction without any API call and then there is post_request_demo.py which will send a file to the API (of course you have to start server as above first).

Developer Notes

For adding new dependencies use pdm. First install via pip:

$ pip install pdm

And then you could add new packages via pdm add. For example numpy via:

$ pdm add numpy

For running linting tools just to the following:

$ pip install tox
$ tox -e lint
$ tox -e test

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

osc_transformer_presteps-0.1.1.tar.gz (2.0 MB view details)

Uploaded Source

Built Distribution

osc_transformer_presteps-0.1.1-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file osc_transformer_presteps-0.1.1.tar.gz.

File metadata

File hashes

Hashes for osc_transformer_presteps-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4a7c9f3954370207dd1d2b9798ce60cd129157404a05f7cb257b0a050b49da40
MD5 4050d8646ef3461a963ea84dc1281ae1
BLAKE2b-256 164a5434918000577a71f71c68ea1a704b56f28224cf25a10a3b837863d89e65

See more details on using hashes here.

File details

Details for the file osc_transformer_presteps-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for osc_transformer_presteps-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0f5ff96824a7d3a9b7eefb64d237a666b1867cf10e0b1021b373f0a9d87bc67b
MD5 0d383dd24867eb7075d23de84da2a55f
BLAKE2b-256 eaca063ee79daacf025a523147b51ea5025f58b0c0a91fcee382db0cdcc3d1ce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page