OS-Climate Data Extraction Tool
Project description
OS-Climate Data Extraction Tool
This code provides you with an api and a streamlit app to which you can provide a pdf document and the output will be the text content in a json format. In the backend it is using a python module for extracting text from pdfs, which might be extended in the future to other file types. The json file is needed for later usage in the context of transformer models to extract relevant information, but it can also be used independently.
Quick start
Install via PyPi
You can simply install the package via:
$ pip install osc-transformer-presteps
Afterwards you can use the tooling as a CLI tool by simply typing:
$ osc-transformer-presteps
We are using typer to have a nice CLI tool here. All details and help will be shown in the CLI tool itself and are not described here in more detail.
Install via Github Repository
For a quick start with the tool install python and clone the repository to your local environment:
$ git clone https://github.com/os-climate/osc-transformer-presteps
Afterwards update your python to the requirements (possible for example via pdm update) and start a local api server via:
$ python ./src/run_server.py
- Note:
We assume that you are located in the cloned repository.
To check if it is running open “http://localhost:8000/liveness” and you should see the message {“message”: “OSC Transformer Pre-Steps Server is running.”}.
Finally, run the following code to start a streamlit app which opens up the possibility to “upload” a file and extract data from pdf to json via this UI. Note that the UI needs the running server so you have to open the streamlit and the server in two different terminals.:
$ streamlit run ./src/osc_transformer_presteps/streamlit/app.py
Note: Check also docs/demo. There you can find local_extraction_demo.py which will start an extraction without any API call and then there is post_request_demo.py which will send a file to the API (of course you have to start server as above first).
Developer Notes
For adding new dependencies use pdm. First install via pip:
$ pip install pdm
And then you could add new packages via pdm add. For example numpy via:
$ pdm add numpy
For running linting tools just to the following:
$ pip install tox $ tox -e lint $ tox -e test
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file osc_transformer_presteps-0.1.1.tar.gz
.
File metadata
- Download URL: osc_transformer_presteps-0.1.1.tar.gz
- Upload date:
- Size: 2.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a7c9f3954370207dd1d2b9798ce60cd129157404a05f7cb257b0a050b49da40 |
|
MD5 | 4050d8646ef3461a963ea84dc1281ae1 |
|
BLAKE2b-256 | 164a5434918000577a71f71c68ea1a704b56f28224cf25a10a3b837863d89e65 |
File details
Details for the file osc_transformer_presteps-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: osc_transformer_presteps-0.1.1-py3-none-any.whl
- Upload date:
- Size: 24.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0f5ff96824a7d3a9b7eefb64d237a666b1867cf10e0b1021b373f0a9d87bc67b |
|
MD5 | 0d383dd24867eb7075d23de84da2a55f |
|
BLAKE2b-256 | eaca063ee79daacf025a523147b51ea5025f58b0c0a91fcee382db0cdcc3d1ce |