Find and extract content in PDFs converted to XML
Project description
# PDFCutter
There are better ways than storing data in a PDF.
**pdfcutter** is for when you need to get it out again.
Works on XML output of `pdftohtml` which belongs to `poppler-utils`.
```python
import pdfcutter
cutter = pdfcutter.PDFCutter(filename='./some.pdf')
name_label = cutter.filter(page=1, search='Name:')
name = cutter.filter(page=1).strictly_right_of(name_label).text()
```
There are better ways than storing data in a PDF.
**pdfcutter** is for when you need to get it out again.
Works on XML output of `pdftohtml` which belongs to `poppler-utils`.
```python
import pdfcutter
cutter = pdfcutter.PDFCutter(filename='./some.pdf')
name_label = cutter.filter(page=1, search='Name:')
name = cutter.filter(page=1).strictly_right_of(name_label).text()
```
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pdfcutter-0.0.1.tar.gz
(7.5 kB
view details)
Built Distribution
File details
Details for the file pdfcutter-0.0.1.tar.gz
.
File metadata
- Download URL: pdfcutter-0.0.1.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 178fce2b6cf8f27bb9ec8b21207c41c38af481ca23f266d61ee29f8444f0ed19 |
|
MD5 | 1bb680cf315e0983202d3c0463f5e9fb |
|
BLAKE2b-256 | 29cb73d52fd296d38bd45846a497a0bfc62a43b5cc2119d12cbf397228e6b23d |
File details
Details for the file pdfcutter-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: pdfcutter-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 99447f47302afcdb2fa60fc7012e367c7721be975042c89be55f5ce8a6889bbe |
|
MD5 | 2b3bb4ffbfcc66206ec040134b1e0221 |
|
BLAKE2b-256 | f6d23f62e276c25f57dfabeed74b6245fab67ba33d9b188cabff501608ffce87 |